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Nucleic acid molecules specific for bacterial 
antigens and uses thereof . 

TECHNICAL FIELD 

The invention relates to novel nucleotide sequences 
located in a gene cluster which controls the synthesis of 
a bacterial polysaccharide antigen, especially an O 
antigen, and the use of those nucleotide sequences for the 
detection of bacteria which express particular 
polysaccharide antigens (particularly O antigens) and for 
the identification of the polysaccharide antigens 
(particularly O antigens) of those bacteria. 

BACKGROUND ART 

5 Enteropathogenic E_ coli strains are well known 

causes of diarrhoea and haemorrhagic colitis in humans and 
can lead to potentially life threatening sequelae 
including haemolytic uremic syndrome and thrombotic 
thrombocytopaenic purpura. Some of these strains are 
0 commonly found in livestock and infection in humans is 

usually a consequence of consumption of contaminated meat 
or dairy products which have been improperly processed. 
The O specific polysaccharide component (the "0 antigen") 
of lipopolysaccharide is known to be a major virulence 
5 factor of enteropathogenic E^. coli strains. 

The L. coli 0 antigen is highly polymorphic and 166 
different forms of the antigen have been defined; Ewing, 
W. H. [in Edwards and Ewings "Identification of the 
Enterobacteriacea" Elsevier. Amsterdam (1986)] discusses 
JO 128 different O antigens while Lior H. (1994) extends the 
number to 166 [in "Classification of Escherichia coli In 
Escherichia coli in domestic animals and humans pp31-72. 
Edited by C.L. Gyles CAB International]. The genus 
salmonella mterica has 46 known O antigen types [Popoff 
3 5 M.Y. et al (1992) " Antigenic formulas of the Salmonella 
enterica serovars" 6th revision WHO Collaborating Centre 
for Reference and Research on .Salmonella enter jca, Institut 
Pasteur Paris France] . 
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An important step in determining the biosynthesis of 
O antigens -and therefore the mechanism of the polymorphism 
has been to characterise the gene clusters controlling O 
antigen biosynthesis. The genes specific for the 
synthesis of the O antigen are generally located in a gene 
cluster at map position 45 minutes on the chromosome of E, 
coli K-12 [Bachmann, B. J- 1990 "Linkage map of 
Escherichia coli K-12". Microbiol. Rev. 54: 130-197], and 
at the corresponding position in enterica LT2 
0 [Sanderson et al (1995) "Genetic map of Salmonella 

enterica typhimurium" , Edition VIII Microbiol. Rev. 59: 
241-303]. In both cases the O antigen gene cluster is 
close to the grid gene as is the case in other strains of 
EU coii and S_ enterica [Reeves P.R. (1994) "Biosynthesis 
5 and assemby of lipopolysaccharide , 281-314. m A. 

Neuberger and L.L.M. van Deenen (eds) "Bacterial cell 
wall, new comprehensive biochemistry " vol 27 Elsevier 
Science Publishers] . These genes encode enzymes for the 
synthesis of nucleotide diphosphate sugars and for 
0 assembly of the sugars into oligosaccharide units and in 
general for polymerisation to O antigen. 

The E_ coli 0 antigen gene clusters for a wide range 
of E_ coli 0 antigens have been cloned but the 07, 09, 016 
and 0111 0 antigens have been studied in more detail with 
25 only 09 and 016 having been fully characterised with 

regard to nucleotide sequence to date [Kido N. , Torgov 
V.I., Sugiyama T., Uchiya K. , Sugihara H., Komatsu T. , 
Kato N . & Jann K. (1995) "Expression of the 09 
polysaccharide of Escherichia coli: sequencing of the E. 
30 coli 09 rfb gene cluster, characterisation of mannosyl 
transferases, and evidence for an ATP-binding cassette 
transport system" J. of Bacterid. 177 217 8-2187; 
Stevenson G., Neal B., Liu D. , Hobbs M. , Packer N.H. , 
Batley M. , Redmond J.W. , Lindquist L. & Reeves PR (1994) 
35 "Structure of the 0 antigen of E. coli K12 and the 

sequence of its rfb gene cluster" J. of Bacterid . 176 
4144-4156; Jayaratne, P. et al . (1991) "Cloning and 
analysis of duplicated rfbM and rfbK genes involved in the 
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formation of GDP-mannose in Escherichia, coli O9:K30 and 
participation of rfb genes in the synthesis of the group 1 
K30 capsular polysaccharide" J. Bacteriol . 176: 3126-3139; 
Valvano, M. A. and Crosa, J. H.(1989) M Molecular cloning 
5 and expression in Escherichia coli K-12 of chromosomal 

genes determining the 07 lipopolysaccharide antigen of a 
human invasive strain of E.coli 07:K1". Inf and Immun. 
57:937-943; Marolda C. L. And Valvano, M. A. (1993). 
"Identification, expression, and DNA sequence of the GDP- 

10 mannose biosynthesis genes encoded by the 07 rfb gene 
cluster of strain VW187 (Eschericia coli 07:Ki;". J. 
Bacteriol . 175 : 148-158 . ] 

Bastin D . A . , et al . 1991 ["Molecular cloning and 
expression in Escherichia coli K-12 of the rfb gene 

15 cluster determining the 0 antigen of an E.coli Olll 

strain 1 '. Mol . Microbiol. 5:9 2223-2231] and Bastin D.A. 
and Reeves, P.R. [(1995)" Sequence and analysis of the O 
antigen gene (rfb) cluster of Escherichia coli Olll". Gene 
164: 17-23] isolated chromosomal DNA encoding the E^_ coli 

20 0111 rfb region and characterised a 6962 bp fragment of E . 
coli 0111 rfb. Six open reading frames (orfs) were 
identified in the 69 62 bp partial fragment and the 
alignment of the sequences of these orfs revealed homology 
with genes of the GDP-mannose pathway, rfbK and rfbM, and 

25 other rfb and cps genes. 

The nucleotide sequences of the loci which control 
expression of Salmonella enter ica B, A, Dl, D2 , D3 , CI, C2 
and E O antigens have been characterised [Brown, P. K., L. 
K. Romana and P. R. Reeves (1991) "Cloning of the rfb gene 

3 0 cluster of a group C2 Salmonella enterica : comparison with 
the rfb regions of groups B and D Mol. Microbiol. 5:1873- 
1881; Jiang, X.-M., B. Neal, F. Santiago, S. J. Lee, L. K. 
Romana, and P. R. Reeves (1991) "Structure and sequence 
of the rfb (0 antigen) gene cluster of Salmonella enterica 

35 serovar typhimurium ( LT2 ) " . Mol. Microbiol. 5:692-713; 
Lee, S. J., L. K. Romana, and P. R. Reeves (1992) 
"Sequences and structural analysis of the rfb (O 
antigen) gene cluster from a group CI Salmonella enterica 
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enterica strain" J. Gen. Microbiol. 138: 1843-1855; Lui, 
D., N. K. Verma, L. K. Romana, and P. R. Reeves (1991) 
"Relationship among the rfb regions of Salmone lla enterica 
serovars A, B and D" J. Bacteriol. 173: 4814-4819; Verma, 
N. K., and P. Reeves (1989) "Identif ication and sequence 
of rf£>S and rfbE, which determine the antigenic 
specificity of group A and group D Salmonella enterica e" 
J. Bacterid. 171: 5694-5701; Wang, L . , L. . K. Romana, and 
P. R. Reeves (1992) "Molecular analysis of a Salmonella 
enterica enterica group El rfb gene cluster: 0 antigen and 
the genetic basis of the major polymorphism" Genetics 
130: 429-443; Wyk, P., and P. Reeves (1989). 
"Identification and sequence of the gene for abequose 
synthase, which confers antigenic specificity on group B 
Salmonella enterica e: homology with galactose epimerase" 
J. Bacterid. 171: 5687-5693,; Xiang, S. H., M. Hobbs , and 
P. R. Reeves. 1994 Molecular analysis of the rfb gene 
luster of a group D2 Salmonella enterica strain: evidence 
for its origin from an insertion sequence -mediated 
recombination event between group E and Dl strains . J . 
Bacterid. 176: 4357 -4365; Curd, H., D. Liu and P. R. 
Reeves, 1998. Relationships among the O antigen Salmonella 
enterica groups B, Dl, D2 , and D3 . J. Bacteriol . 180: 
1002-1007 . ] . 

Of the closely related Shigella (which really can be 
considered to be part of E^ coli) dysenteriae and S. 

f lexneri 0 antigens have been fully sequenced and are next 
to grid. [Klena JD & Schnaitman CA (1993) "Function of the 
rfb gene cluster and the rfe gene in the synthesis of O 
antigen by Shigella, dysenteriae 1" Mol . Microbiol. 9 393- 
402; Morona R. , Mavris M. , Fallarino A. & Manning P. 
(1994) "Characterisation of the rfc region of Shigella 
flexneri" J. Bacteriol 176: 733-747] 

Inasmuch as the O antigen of enteropathogenic E^ coli 
strains and the O antigen of Salmonella enterica strains 
are major virulence factors and are highly polymorphic, 
there is a real need to develop highly specific, 
sensitive, rapid and inexpensive diagnostic assays to 
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detect Ej_ coli and assays to detect S_ en t erica . There is 
also a rea-1 need to develop diagnostic assays to identify 
the O antigens of coli strains and assays to identify 
the O antigens of S_ enterica strains. With regard to the 
5 detection of coli these needs extend beyond EHEC 

(enteropathogenic haemorrhagic E_ colil strains but this 
is the area of greatest need. There is interest in 
diagnostics for ETEC (enterotoxigenic E_ colil etc in E_ 
coli . 

10 The first diagnostic systems employed in this field 

used large panels of antisera raised against E^ coli O 
antigen expressing strains or S_ enterica O antigen 
expressing strains. This technology has inherent 
difficulties associated with the preparation, storage and 
15 usage of the reagents, as well as the time required to 
achieve a meaningful diagnostic result. 

Nucleotide sequences derived from the O antigen gene 
clusters of enterica strains have been used to 
determine enterica O antigens in a PCR assay [Luk, 
20 J.M.C. et al. (1993) "Selective amplification of abequose 
and paratose synthase genes (rfi» by polymerase chain 
reaction for identification of S_, enterica major serogoups 
(A, B, C2, andD)", J ■ Clin. Microbiol. 31:2118-2123 ]. 
The prior complete nucleotide sequence characterisation of 
25 the entire rfb locus of serovars Typhimurium, Paratyphi A, 
Typhi, Muenchen, and Ana turn; representing groups B, A, Dl, 
C2 and El respectively enabled Luk et al . to select 
oligonucleotide primers specific for those serogroups . 
Thus the approach of Luk et al . was based on aligning 
3 0 known nucleotide sequences corresponding to CDP-abequose 
and CDP-paratose synthesis genes within the O antigen 
regions of £. enterica serogroups El, Dl, A, B and C2 and 
exploiting the observed nucleotide sequence differences in 
order to identify serotype- specific oligonucleotides. 
35 in an attempt to determine the O antigen serotype of 

a Shiga-like toxin producing L. coli strain, Paton, A. W. , 
et al. 1996 ["Molecular microbiological investigation of 
an outbreak of Hemolytic -Uremic Syndrome caused by dry 
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fermented sausage contaminated with Shiga- like toxin 
producing -Escherichia coli". J ■ Clin. Microbiol. 34: 1622- 
1627], used oligonucleotides derived from the wbdl (orf6) 
region, which were believed to be specific to the E_ coli 
0111 antigen and which were derived from coli 0111 
sequence, in a PCR diagnostic assay. Unpublished reports 
indicate that the approach of Paton et al . is deficient in 
that the nucleotide sequences derived from wbdl may not 
specifically identify the 0111 antigen and in fact lead to 
detection of false positive results. Paton et al . 
disclose the detection of' 5 Olll antigen isolates by PCR 
when in fact from only 3 of those isolates did they detect 
bacteria which reacted with Olll specific antiserum. 



5 DESCRIPTION OF THE INVENTION 

Whilst not wanting to be held to a particular 
hypothesis, the present inventors now believe that the 
reported false positives found with the Paton et al . 
method are due to the fact that the nucleic acid molecules 
0 employed by Paton et al . were derived from genes which 

have a putative function as a sugar pathway gene, [Bastin 
D.A. and Reeves, P.R. (1995) Sequence and analysis of the 
0 antigen gene (rfl?) cluster of Escherichia coli Olll. Gene 
164: 17-23] which they now believe to lack the necessary 
5 nucleotide sequence specificity to identify the E_ coli O 
antigen. The inventors now believe that many of the 
nucleic acid molecules derived from sugar pathway genes 
expressed in S^ enterica or other enterobacteria are also 
likely to lack the necessary nucleotide sequence 
30 specificity to identify specific 0 antigens or specific 
serotypes . 

In this regard it is important to note that the genes 
for the synthesis of a polysaccharide antigen include 
those related to the synthesis of the sugars present in 
3 5 the antigen (sugar pathway genes) and those related to the 
manipulation of those sugars to form the polysaccharide. 
The present invention is predominantly concerned with the 
latter group of genes, particularly the assembly and 
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transport genes such as transferase, polymerase and 
flippase genes. 

The present inventors have surprisingly found that 
the use of nucleic acid molecules derived from particular 
assembly and transport genes, particularly transferase, 
wzx and wzy genes, within O antigen gene clusters can 
improve the specificity of the detection and 
identification of O antigens. The present inventors 
believe that the invention is not necessarily limited to 
the detection of the particular O antigens which are 
encoded by the nucleic acid molecules exemplified herein, 
but has broad application for the detection of bacteria 
which express an O antigen and the identification of O 
antigens in general. Further because of the similarities 
between the gene clusters involved in the synthesis of O 
antigens and other polymorphic polysaccharide antigens, 
such as bacterial capsular antigens, the inventors believe 
that the methods and molecules of the present invention 
are also applicable to these other polysaccharide 
> antigens . 

Accordingly, in one aspect the present invention 
relates to the identification of nucleic acid molecules 
which are useful for the detection and identification of 
specific bacterial polysaccharide antigens. 
5 The invention provides a nucleic acid molecule 

derived from: a gene encoding a transferase; or a gene 
encoding an enzyme for the transport or processing of a 
polysaccharide or oligosaccharide unit, including a wzx 
gene, wzy gene, or a gene with a similar function; the 
0 gene being involved in the synthesis of a particular 
bacterial polysaccharide antigen, 

wherein the sequence of the nucleic acid molecule is 
specific to the particular bacterial polysaccharide 
antigen . 

5 Polysaccharide antigens, such as capsular antigens of 

coli (Type I and Type II) , the Virulence capsule of 
enterica sv Typhi and the capsules of species such as 
Streptococcus pneumoniae and Staphylococcus albus are 
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encoded by genes which include nucleotide sugar pathway 
genes, sugar transferase genes and genes for the transport 
and processing of the polysaccharide or oligosaccharide 
unit. In some cases these are wzx or wzy but in other 
cases they are quite different because a different 
processing pathway is used. Examples of other gene 
clusters include the gene clusters for an extracellular 
polysaccharide of Streptococcus thermouhilus , an 
exopolysaccharide of Rhizobium melilotti and the K2 
capsule of Klebsiella pneumoniae . These all have genes 
which by experimental analysis, comparison of nucleotide 
sequence or predicted protein structure, can be seen to 
include nucleotide sugar pathway genes, sugar transferase 
genes and genes for oligosaccharide or polysaccharide 
processing . 

In the case of the coli K-12 colanic acid capsule 
gene cluster [Stevenson et al (1996) "Organization of the 
Escherichia coli K-12 gene cluster responsible for 
production of the extracellular polysaccharide colanic 
acid". J. Bacteriol 178: 4885-4893] genes from the three 
classes were identified either provisionally or 
definitively. Colanic acid capsule is classified with the 
Type I capsule of EL. coli. 

The present inventors believe that, in general, 
transferase genes and genes for oligosaccharide processing 
will be more specific for a given capsule than the genes 
coding for the nucleotide sugar synthetic pathways as most 
sugars present in such capsules occur in the capsules of 
different serotypes. Thus the nucleotide sugar synthesis 
pathway genes could now be predicted to be common to more 
than one capsule type. 

As elaborated below the present inventors recognise 
that there may be polysaccharide antigen gene clusters 
which share transferase genes and/ or genes for 
oligosaccharide or polysaccharide processing so that 
completely random selection of nucleotide sequences from 
within these genes may still lead to cross-reaction; an 
example with respect to capsular antigens is provided by 
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the coli type II capsules for which only transferase 
genes are -sufficiently specific. However, the present 
inventors in light of their current results nonetheless 
consider the transferase genes or genes controlling 
oligosaccharide or polysaccharide processing to be 
superior targets for nucleotide sequence selection for the 
specific detection and characterisation of polysaccharide 
antigen types. Thus where there is similarity between 
particular genes, selection of nucleotide sequences from 
within other transferase genes or genes for 
oligosaccharide or polysaccharide processing from within 
the relevant gene cluster will still provide specificity, 
or alternatively the use of combinations of nucleotide 
sequences will provide the desired specificity. The 
combinations of nucleotide sequences may include 
nucleotide sequences derived from pathway genes together 
with nucleotide sequences derived from transferase, wzx or 
wzy genes . 

Thus the invention also provides a panel of nucleic 
acid molecules wherein the nucleic acid molecules are 
derived from a combination of genes encoding transferases 
and/or enzymes for the transport or processing of a 
polysaccharide or oligosaccharide unit including wzx or 
wzy genes; wherein the combination of genes is specific to 
the synthesis of a particular bacterial polysaccharide 
antigen and wherein the panel of nucleic acid molecules is 
specific to a bacterial polysaccharide antigen. In 
another preferred form, the nucleic acid molecules are 
derived from a combination of genes encoding transferases 
and/or enzymes for the transport or processing of a 
polysaccharide or oligosaccharide unit including wzx or 
wzy genes, together with nucleic acid molecules derived 
from pathway genes . 

In a second aspect the present invention relates to 
> the identification of nucleic acid molecules which are 
useful for the detection of bacteria which express O 
antigens and for the identification of the O antigens of 
those bacteria in diagnostic assays . 
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The invention provides a nucleic acid molecule 
derived from: a gene encoding a transferase; or a gene 
encoding an enzyme for the transport or processing of a 
polysaccharide or oligosaccharide unit such as a wzx or 
5 wzy gene, the gene being involved in the synthesis of a 
particular bacterial 0 antigen, wherein the sequence of 
the nucleic acid molecule is specific to the particular 
bacterial O antigen. 

The nucleic acids of the invention may be variable in 
10 length. In one embodiment they are from about 10 to about 
20 nucleotides in length. 

In one preferred embodiment, the invention provides a 
nucleic acid molecule derived from: a gene encoding a 
transferase; or a gene encoding an enzyme for the 
15 transport or processing of a polysaccharide or 

oligosaccharide unit including a wzx or wzy gene the gene 
being involved in the synthesis of an O antigen expressed 
by coli , wherein the sequence of the nucleic acid 
molecule is specific to the O antigen. 
20 In one more preferred embodiment, the sequence of the 

nucleic acid molecule is specific to the nucleotide 
sequence encoding the 0111 antigen (SEQ ID NO:l). More 
preferably, the sequence is derived from a gene selected 
from the group consisting of wbdH (nucleotide position 739 
25 to 1932 of SEQ ID NO:l), wzx (nucleotide position 8646 to 
9911 of SEQ ID NO:l), wzy (nucleotide position 9901 to 
10953 of SEQ ID NO:l), wbdM (nucleotide position 11821 to 
12 945 of SEQ ID NO:l) and fragments of those molecules of 
at least 10-12 nucleotides in length. Particularly 
3 0 preferred nucleic acid molecules are those set out in 

Table 5 and 5A, with respect to the above mentioned genes. 

In another more preferred embodiment, the sequence of 
the nucleic acid molecule is specific to the nucleotide 
sequence encoding the 0157 antigen (SEQ ID NO : 2 ) . More 
3 5 preferably the sequence is derived from a gene selected 

from the group consisting of wbdN (nucleotide position 79 
to 861 of SEQ ID NO:2), wbdO , (nucleotide position 2011 to 
2757 of SEQ ID NO:2), wbdP (nucleotide position 5257 to 
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6471 of SEQ ID NO : 2 ) ) , wbdR (13156 to 13821 of SEQ ID 
NO:2), wzx- (nucleotide position 2744 to 4135 of SEQ ID 
NO: 2) and wzy (nucleotide position 858 to 2042 of SEQ ID 
NO: 2). Particularly preferred nucleic acid molecules are 
those set out in Table 6 and 6A. 

The invention also provides in a further preferred 
embodiment a nucleic acid molecule derived from: a gene 
encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
oligosaccharide unit including a wzx or wzy gene; the gene 
being involved in the synthesis of an O antigen expressed 
by Salmonella enterica , wherein the sequence of the 
nucleic acid molecule is specific to the O antigen. 

In one more preferred form of this embodiment, the 
sequence of the nucleic acid molecule is specific to the 
nucleotide sequence encoding the enterica C2 antigen 
(SEQ ID NO:3). More preferably the sequence of the 
nucleic acid molecule is derived from a gene selected from 
the group consisting of wbaR (nucleotide position 2352 to 
3314 of SEQ ID NO:3), wbaL (nucleotide position 3361 to 
3875 of SEQ ID NO : 3 ) , wbaQ (nucleotide position 3977 to 
5020 of SEQ ID NO:3), wbaW (nucleotide position 6313 to 
7323 of SEQ ID NO:3), wbaZ (nucleotide position 7310 to 
8467 of SEQ ID NO: 3) , wzx (nucleotide position 1019 to 
2359 of SEQ ID NO: 3) and wzy (nucleotide position 5114 to 
6313 of SEQ ID NO: 3) . Particularly preferred nucleic acid 
molecules are those set out in Table 7 . 

In another more preferred form of this embodiment, 
the sequence of the nucleic acid molecule is specific to 
the nucleotide sequence encoding the enterica B antigen 
(SEQ ID NO: 4). More preferably the sequence is derived 
from wzx (nucleotide position 12762 to 14054 of SEQ ID 
NO: 4) or wbaV (nucleotide position 14059 to 15060 of SEQ 
ID NO: 4) . Particularly preferred nucleic acid molecules 
are those set out in Table 8 which are derived from wzx 
and wbaV genes. 

In a further more preferred form of this embodiment, 
the sequence of the nucleic acid molecule is specific to 
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the enterica D3 O antigen and is derived from the wzy 
gene . 

In yet a further preferred form of this embodiment, 
the sequence of the nucleic acid molecule is specific to 
the £L_ enterica El O antigen and is derived from the wzx 
gene . 

While transferase genes, or genes coding for the 
transport or processing of a polysaccharide or 
oligosaccharide unit, such as a wzx or wzy gene, are 
superior targets for specific detection of individual O 
antigen types there may well be individual genes or parts 
of them within this group that can be demonstrated to be 
the same or closely related between different O antigen 
types such that cross-reactions can occur. Cross 
reactions should be avoided by the selection of a 
different target within the group or the use of multiple 
targets within the group. 

Further, it is recognised that there are cases where 
O antigen gene clusters have arisen from recombination of 
at least two strains such that the unique O antigen type 
is provided by a combination of gene products shared with 
at least two other O antigen types. The recognised 
example of this phenomenon is the enterica O antigen 
serotype D2 which has genes from Dl and El but none unique 
to D2 . In these circumstances the detection of the O 
antigen type can still be achieved in accordance with the 
invention, but requires the use of a combination of 
nucleic acid molecules to detect a specific combination of 
genes that exists only in that particular O antigen gene 
cluster . 

Thus, the invention also provides a panel of nucleic 
acid molecules wherein the nucleic acid molecules are 
derived from genes encoding transferases and/or enzymes 
for the transport or processing of a polysaccharide or 
i oligosaccharide unit including wzx or wzy genes, wherein 
the panel of nucleic acid molecules is specific to a 
bacterial O antigen. Preferably the particular bacterial 
O antigen is expressed by enterica . More preferably, 
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the panel of nucleic acid molecules is specific to the D2 
0 antigen and is derived from the El wzy gene and the Dl 
wzx gene . 

The combinations of nucleotide sequences may include 
nucleotide sequences derived from pathway genes, together 
with nucleotide sequences derived from transferase, wzx or 
wzy genes . 

Thus, the invention also provides a panel of nucleic 
acid molecules, wherein the nucleic acid molecules are 
derived from genes encoding transferases and/ or enzymes 
for the transport or processing of a polysaccharide or 
oligosaccharide unit including wzx or wzy genes, and sugar 
pathway genes, wherein the panel of nucleic acid molecules 
is specific to a particular bacterial 0 antigen. 
Preferably the O antigen is expressed SL. enterica. 

Further it is recognised that there may be instances 
where spurious hybridisation will arise through initial 
selection of a sequence found in many different genes but 
this is typically recognisable by, for instance, 
comparison of band sizes against controls in PCR gels, and 
an alternative sequence can be selected. 

The present inventors believe that based on the 
teachings of the present invention and available 
information concerning polysaccharide antigen gene 
clusters (including 0 antigen gene clusters), and through 
use of experimental analysis, comparison of nucleic acid 
sequences or predicted protein structures, nucleic acid 
molecules in accordance with the invention can be readily 
derived for any particular polysaccharide antigen of 
interest. Suitable bacterial strains can typically be 
acquired commercially from depositary institutions. 

As mentioned above there are currently 166 defined L. 
coli 0 antigens while the enterica has 46 known 0 
antigen types [Popoff M.Y. et al (1992) "Antigenic 
formulas of the Salmonella serovars" 6th revision WHO 
Collaborating centre for Reference and Research on 
Salmonella, Institut Pasteur Paris France] . Many other 
genera of bacteria are known to have O antigens and these 
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include ri trobacter . Shigella , Yersinia, Plesiomonas , 
Vibrio and Proteus . 

Samples of the 166 different E. coli O antigen 
serotypes are available from Statens Serum Institut, 
Copenhagen, Denmark. 

The 46 S_^ enterica serotypes are available from 
Institute of Medical and Veterinary Science, Adelaide, 
Australia. 

In another aspect, the invention relates to a method 
of testing a sample for the presence of one or more 
bacterial polysaccharide antigens comprising contacting 
the sample with at least one oligonucleotide molecule 
capable of specifically hybridising to: (i) a gene 
encoding a transferase, or (ii) a gene encoding an enzyme 
i for transport or processing of oligosaccharide or 

polysaccharide units, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the bacterial 
polysaccharide antigen; under conditions suitable to 
permit the at least one oligonucleotide molecule to 
0 specifically hybridise to at least one such gene of any 
bacteria expressing the particular bacterial 
polysaccharide antigen present in the sample and detecting 
any specifically hybridised oligonucleotide molecules. 

Where a single specific oligonucleotide molecule is 
5 unavailable a combination of molecules hybridising 

specifically to the target region may be used. Thus the 
invention provides a panel of nucleic acid molecules for 
use in the method of testing of the invention, wherein the 
nucleic acid molecules are derived from genes encoding 
30 transferases and/or enzymes for the transport or 

processing of a polysaccharide or oligosaccharide unit 
including wzx or wzy genes, wherein the panel of nucleic 
acid molecules is specific to a particular bacterial 
polysaccharide. The panel of nucleic acid molecules can 
35 include nucleic acid molecules derived from sugar pathway 
genes where necessary. 

In another aspect, the invention relates to a method 
of testing a sample for the presence of one or more 
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bacterial polysaccharide antigens comprising contacting 
the sample -with at least one pair of oligonucleotide 
molecules, with at least one oligonucleotide molecule of 
the pair capable of specifically hybridising to: (i) a 
5 gene encoding a transferase, or (ii) a gene encoding an 
enzyme for transport or processing oligosaccharide or 
polysaccharide units, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the bacterial 
polysaccharide antigen; under conditions suitable to 
10 permit the at least one oligonucleotide molecule of the 
pair of molecules to specifically hybridise to at least 
one such gene of any bacteria expressing the particular 
bacterial polysaccharide antigen present in the sample and 
detecting any specifically hybridised oligonucleotide 

15 molecules. 

The pair of oligonucleotide molecules may both 
hybridise to the same gene or to different genes. Only 
one oligonucleotide molecule of the pair need hybridise 
specifically to sequence specific for the particular 
20 antigen type. The other molecule can hybridise to a non- 
specific region. 

Where the particular polysaccharide antigen gene 
cluster has arisen through recombination, the at least one 
pair of oligonucleotide molecules may be selected to be 
25 capable of hybridising to a specific combination of genes 
in the cluster specific to that polysaccharide antigen, or 
multiple pairs may be selected to provide hybridisation to 
the specific combination of genes. Even where all the 
genes in a particular cluster are unique, the method may 
30 be carried out using nucleotide molecules which recognise 
a combination of genes within the cluster. 

Thus the invention provides a panel containing pairs 
of nucleic acid molecules for use in the method of testing 
of the invention, wherein the pairs of nucleic acid 
35 molecules are derived from genes encoding transferases 
and/ or enzymes for the transport or processing of a 
polysaccharide or oligosaccharide unit including wzx or 
wzy genes, wherein the panel of nucleic acid molecules is 
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specific to a particular bacterial polysaccharide antigen. 
The panel of nucleic acid molecules can include pairs of 
nucleic acid molecules derived from sugar pathway genes 
where necessary. 

In another aspect, the invention relates to a method 
of testing a sample for the presence of one or more 
particular bacterial O antigens comprising contacting the 
sample with at least one oligonucleotide molecule capable 
of specifically hybridising to: (i) a gene encoding an O 
antigen transferase, or (ii) a gene encoding an enzyme for 
transport or processing of the oligosaccharide or 
polysaccharide unit, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the particular O 
antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
at least one such gene of any bacteria expressing the 
particular bacterial O antigen present in the sample and 
detecting any specifically hybridised oligonucleotide 
molecules. Preferably the bacteria are L coli or 
enterica . More preferably, the coli express the 0157 
serotype or the 0111 serotype. More preferably the 
enterica express the C2 or B serotype. Preferably, the 
method is a Southern blot method. More preferably, the 
nucleic acid molecule is labelled and hybridisation of the 
nucleic acid molecule is detected by autoradiography or 
detection of fluorescence. 

The inventors envisage circumstances where a single 
specific oligonucleotide molecule is unavailable. In 
these circumstances a combination of molecules hybridising 
specifically to the target region may be used. Thus the 
invention provides a panel of nucleic acid molecules for 
use in the method of testing of the invention, wherein the 
nucleic acid molecules are derived from genes encoding 
transferases and/or enzymes for the transport or 
processing of a polysaccharide or oligosaccharide unit 
including wzx or wzy genes, wherein the panel of nucleic 
acid molecules is specific to a particular bacterial O 
antigen. Preferably the particular bacterial O antigen is 
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expressed by enterica . The panel of nucleic acid 
molecules can include nucleic acid molecules derived from 
sugar pathway genes where necessary. 

In another aspect, the invention relates to a method 
of testing a sample for the presence of one or more- 
particular bacterial O antigens comprising contacting the 
sample with at least one pair of oligonucleotide molecules 
with at least one oligonucleotide molecule of the pair 
being capable of specifically hybridising to: (i) a gene 
encoding an O antigen transferase, or (ii) a gene encoding 
an enzyme for transport or processing of the 
oligosaccharide or polysaccharide unit, including a wzx or 
wzy gene; wherein said gene is involved in the synthesis 
of the particular 0 antigen; under conditions suitable to 
permit the at least one oligonucleotide molecule to 
specifically hybridise to at least one such gene of any 
bacteria expressing the particular bacterial O antigen 
present in the sample and detecting any specifically 
hybridised oligonucleotide molecules. 

Preferably the bacteria are JL. col. i or enterica. 
More preferably, the coli are of the 0111 or the 0157 
serotype. More preferably the enterica express the C2 
or B serotype. Preferably, the method is a polymerase 
chain reaction method. More preferably the oligonucleotide 
5 molecules for use in the method of the invention are 
labelled. Even more preferably the hybridised 
oligonucleotide molecules are detected by electrophoresis. 
Preferred oligonucleotides for use with 0111 which provide 
for specific detection of 0111 are illustrated in Table 5 
0 and 5A with respect to the genes wbdH, wzx, wzy and wbdM. 
Preferred oligonucleotide molecules for use with 0157 
which provide for specific detection of 0157 are 
illustrated in Table 6 and 6A. 

With respect to serotypes C2 and B, suitable 
5 oligonucleotide molecules can be selected from appropriate 
regions described in column 3 of Tables 7 and 8 . 

The inventors envisage rare circumstances whereby two 
genetically similar gene clusters encoding serologically 
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different O antigens have arisen through recombination of 
genes or mutation so as to generate polymorphic variants . 
In these circumstances multiple pairs of oligonucleotides 
may be selected to provide hybridisation to the specific 
combination of genes. The invention thus provides a panel 
containing pairs of nucleic acid molecules for use in the 
method of testing of the invention, wherein the pairs of 
nucleic acid molecules are derived from genes encoding 
transferases and/or enzymes for the transport or 
) processing of a polysaccharide or oligosaccharide unit 
including wzx or wzy genes, wherein the panel of nucleic 
acid molecules is specific to a particular bacterial O 
antigen. Preferably the particular bacterial O antigen is 
expressed by £L enterica. The panel of nucleic acid 
5 molecules can include pairs of nucleic acid molecules 
derived from sugar pathway genes where necessary. 

In another aspect, the invention relates to a method 
for testing a food derived sample for the presence of one 
or more particular bacterial O antigens comprising 
0 contacting the sample with at least one pair of 

oligonucleotide molecules with at least one oligonucleotide 
molecule of the pair being capable of specifically 
hybridising to: (i) a gene encoding an O antigen 
transferase, or (ii) a gene encoding an enzyme for 
5 transport or processing of the oligosaccharide or 

polysaccharide unit, including a wzx or wzy gene; wherein 
the gene is involved in the synthesis of the particular O 
antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
0 at least one such gene of any bacteria expressing the 

particular bacterial polysaccharide antigen present in the 
sample and detecting any specifically hybridised 
oligonucleotide molecules. Preferably the bacteria are EL. 
coli or iL_ enterica . More preferably, the EL. coli are of 
S5 the 0111 or 0157 serotype. More preferably the 

enterica are of the C2 or B serotype. Preferably, the 
method is a polymerase chain reaction method. More 
preferably the oligonucleotide molecules for use in the 
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method of the invention are labelled. Even more 
preferably the hybridised oligonucleotide molecules are 
detected by electrophoresis. 

In another aspect the present invention relates to a 
5 method for testing a faecal derived sample for the presence 
of one or more particular bacterial O antigens comprising 
contacting the sample with at least one pair of 
oligonucleotide molecules with at least one oligonucleotide 
molecule of the pair being capable of specifically 

10 hybridising to: (i) a gene encoding an O antigen 

transferase, or (ii) a gene encoding an enzyme for 
transport or processing of the oligosaccharide or 
polysaccharide unit, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the particular O 

15 antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
at least one of said genes of any bacteria expressing the 
particular bacterial O antigen present in the sample and 
detecting any specifically hybridised oligonucleotide 

2 0 molecules. Preferably the bacteria are coli or S . 

enterica. More preferably, the EL_ coli are of the 0111 or 
0157 serotype. More preferably, the enterica are of 
the C2 or B serotype. Preferably, the method is a 
polymerase chain reaction method. More preferably the 

25 oligonucleotide molecules for use in the method of the 
invention are labelled. Even more preferably the 
hybridised oligonucleotide molecules are detected by 
electrophoresis . 



3 0 method for testing a sample derived from a patient for the 
presence of one or more particular bacterial O antigens 
comprising contacting the sample with at least one pair of 
oligonucleotide molecules with at least one oligonucleotide 
molecule of the pair being capable of specifically 

3 5 hybridising to: (i) a gene encoding an O antigen 

transferase, or (ii) a gene encoding an enzyme for 
transport or processing of the oligosaccharide or 
polysaccharide unit, including a wzx or wzy gene; wherein 



In another aspect, the present invention relates to a 
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said gene is involved in the synthesis of the particular O 
antigen; under conditions " suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
at least one such gene of any bacteria expressing the 
particular bacterial O antigen present in the sample and 
detecting any specifically hybridised oligonucleotide 
molecules. Preferably the bacteria are E^. coli or S... 
enterica. More preferably, the E^. coli are of the 0111 or 
0157 serotype. More preferably, the enterica are of 
the C2 or B serotype. Preferably, the method is a 
polymerase chain reaction method. More preferably the 
oligonucleotide molecules for use in the method of the 
invention are labelled. Even more preferably the 
hybridised oligonucleotide molecules are detected by 
> electrophoresis. 

in the above described methods it will be understood 
that where pairs of oligonucleotides are used one of the 
oligonucleotide sequences may hybridise to a sequence that 
is not from a transferase, wzx or wzy gene. Further where 
0 both hybridise to one of these gene products they may 

hybridise to the same or a different one of these genes. 

In addition it will be understood that where cross 
reactivity is an issue a combination of oligonucleotides 
may be chosen to detect a combination of genes to provide 

5 specificity. 

The invention further relates to a diagnostic kit 
which can be used for the detection of bacteria which 
express bacterial polysaccharide antigens and the 
identification of the bacterial polysaccharide type of 

30 those bacteria. 

Thus in a further aspect, the invention relates to a 
kit comprising a first vial containing a first nucleic 
acid molecule capable of specifically hybridising to: (i) 
a gene encoding a transferase, or (ii) a gene encoding an 

35 enzyme for transport or processing oligosaccharide or 

polysaccharide, including a wzx or wzy gene, wherein the 
said gene is involved in the synthesis of a bacterial 
polysaccharide. The kit may also provide in the same or 
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separate vial a second specific nucleic acid capable of 
specifically hybridising " to : (i) a gene encoding a 
transferase, or (ii) a gene encoding an enzyme for 
transport or processing oligosaccharide or polysaccharide, 
5 including a wzx or wzy gene, wherein the said gene is 

involved in the synthesis of a bacterial polysaccharide, 
wherein the sequence of the second nucleic acid molecule 
is different from the sequence of the first nucleic acid 
molecule . 

10 in a further aspect the invention relates to a kit 

comprising a first vial containing a first nucleic acid 
molecule capable of specifically hybridising to: (i) a 
gene encoding a transferase, or (ii) a gene encoding an 
enzyme for transport or processing oligosaccharide or 
15 polysaccharide including wzx or wzy, wherein the said gene 
is involved in the synthesis of a bacterial O antigen. 
The kit may also provide in the same or a separate vial a 
second specific nucleic acid capable of specifically 
hybridising to: (i) a gene encoding a transferase , or 
20 (ii) a gene encoding an enzyme for transport or processing 
oligosaccharide or polysaccharide including wzx or wzy, 
wherein the said gene is involved in the synthesis of O 
antigen, wherein the sequence of the second nucleic acid 
molecule is different from the sequence of the first 
25 nucleic acid molecule. Preferably the first and second 
nucleic acid sequences are derived from coli or the 
first and second nucleic acid sequences are derived from 
S . enterica . 

The present inventors provide full length sequence of 
30 the 0157 gene cluster for the first time and recognise 

that from this sequence of this previously uncloned full 
gene cluster appropriate recombinant molecules can be 
generated and inserted for expression to provide expressed 
0157 antigens useful in applications such as vaccines. 



35 



DEFINITIONS 

The phrase, "a nucleic acid molecule derived from a 
gene" means that the nucleic acid molecule has a 
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nucleotide sequence which is either identical or 
substantially similar to all or part of the identified 
gene. Thus a nucleic acid molecule derived from a gene 
can be a molecule which is isolated from the identified 
5 gene by physical separation from that gene, or a molecule 
which is artificially synthesised and has a nucleotide 
sequence which is either identical to or substantially 
similar to all or part of the identified gene. While some 
workers consider only the DNA strand with the same 
10 sequence as the mRNA transcribed from the gene, here 
either strand is intended. 

Transferase genes are regions of nucleic acid which 
have a nucleotide sequence which encodes gene products 
that transfer monomeric sugar units. 
15 Flippase or wzx genes are regions of nucleic acid 

which have a nucleotide sequence which encodes a gene 
product that flips oligosaccharide repeat units generally 
composed of three to six monomeric sugar units to the 
external surface of the membrane. 
2 0 Polymerase or wzy genes are regions of nucleic acid 

which have a nucleotide sequence which encodes gene 
products that polymerise repeating oligosaccharide units 
generally composed of 3-6 monomeric sugar units. 
The nucleotide sequences provided in this 

2 5 specification are described in the sequence listing as 

anti-sense sequences. This term is used in the same 
manner as it is used in Glossary of Biochemistry and 
Molecular Biology Revised Edition, David M. Glick, 1997 
Portland Press Ltd. , London on page 11 where the term is 

3 0 described as referring to one of the two strands of 

double-stranded DNA usually that which has the same 
sequence as the mRNA. We use it to describe this strand 
which has the same sequence as the mRNA. 
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NOMENCLATURE 

Synonyms for E_^ coli 0111 rfb 
Current names Our names Bastin et al . — 1 . 9 91 

wbdH orfl 

gmd orf2 ~ 

wbdl orf3 orf3.4* 

manC orf4 rfbM* 

manB orf5 rf ^* 

wbdJ orf6 orf6.7* 

wbdK orf7 orf7.7* 

wzx orf8 orf8.9 and rfbX* 

wzy orf9 

wbdL orflO 

wbdM orfll 

* Nomenclature according to Bastin D.A. , et al . 1991 -Molecular 
cloning and expression in Escherichia coli K-12 of the rfb gene 
cluster determining the O antigen of an L. coli Olll strain". Mol . 
Microbiol. 5:9 2223-2231. 



Other Synonyms 



wzy rfc 

wzx rfbX 

rmlA rfbA 

rmlB rfbB 

rmlC rfbC 

rmlD rfbD 

glf orf6* 

wbbl orf3#, orf8* of coli K-12 

wbbJ orf2#, orf9* of L. coli K-12 

wbbK orfl#, orflO* of E^ coli K-12 

wbbL orf5#, orf 11* of coli K-12 

# Nomenclature according to Yao, Z. And M. A. Valvano 199 



"Genetic analysis of the O-specific lipopolysaccharide biosynthesis 
region (rfb) of Eschericia coli K-12 W3110: identification of genes 
the confer groups-specif icty to Shigella flexineri serotypes Y and 
4a". J. Bacteriol. 176: 4133-4143. 

Nomenclature according to Stevenson et al. 1994. "Structure of 
the 0-antigen of E . coli K-12 and the sequence of its rfb gene 
cluster". J. Bacteriol 176: 4144-4156. 

• S. enterica is a name introduced in 1987 to replace the many other 
names such as Salmonella typhi and Salmonella tvphimurium, the old 
species names becoming serovar names as in S^ enterica sv Typhi. 
However, the traditional names are still widely used. 

• The 0 antigen genes of many species were given rfb names (rfbA etc) 
and the O antigen gene cluster was often referred to as the rfb 
cluster. There are now new names for the rfb genes as shown in the 
table. Both terminologies have been used herein, depending on the 
source of the information. 
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. BRIEF DESCRIPTION OF DRAWINGS 

Figure 1 shows Eco R-l restriction maps of cosmid 
clones PPR1054, P PR1055, pPR1056, P PR1058, P PR1287 which 
are subclones of L. coli 0111 O antigen gene cluster. The 
5 thickened line is the region common to all clones. Broken 
lines show segments that are non- contiguous on the 
chromosome. The deduced restriction map for EL. coli 
strain M92 is shown above. 

Figure 2 shows a restriction mapping analysis of 
10 coli QUI 0 antigen gene cluster within the cosmid clone 
PPR1058. Restriction enzymes are: (B: BamHl; Bg : Bglll, 
E: £coRl; H: Hindlll; K: Kpnl ; P: Pstl; S: Sail and X: 
Xhol. Plasmids P PR1230, pPR1231, and P PR1288 are deletion 
derivatives of pPRl058. Plasmids pPR 1237, pPR1238, 
15 pPRl239 and pPR1240 are in pUC19 . Plasmids pPRl243, 

PPR1244, pPR1245, pPRl246 and P PR1248 are in pUC18, and 
PPR1292 is in pUC19. Plasmid pPR1270 is in pT7T319U. 
Probes 1, 2 and 3 were isolated as internal fragments of 
P PR1246, pPRl243 and pPR1237 respectively. Dotted lines 
20 indicate that subclone DNA extends to the left of the map 
into attached vector . 

Figure 3 shows the structure of coli 0111 O 
antigen gene cluster. 

Figure 4 shows the structure of EL_ coli 0157 O 
25 antigen gene cluster. 

Figure 5 shows the structure enterica locus 
encoding the serogroup C2 O antigen gene cluster. 

Figure 6 shows the structure enterica locus 
encoding the serogroup B O antigen gene cluster. 
30 Figure 7 shows the nucleotide sequence of the coli 

0111 O antigen gene cluster. Note: (1) The first and last 
three bases of a gene are underlined and of italic 
respectively.; (2) The region which was previously 
sequenced by Bastin and Reeves 1995 "Sequence and anlysis 
35 of the O antigen gene (rfb) cluster of Eschericia coli 
olll" Gene 164: 17-23 is marked. 

Figure 8 shows the nucleotide sequence of the EL. coli 
0157 O antigen gene cluster. Note: (1) The first and last 
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three bases of a gene (region) are underlined and of italic 
respectively (2) The region previously sequenced by Bilge 
et al. 1996 "Role of the Eschericia coli 0157-H7 O side 
chain in adherence and analysis of an rfb locus". Inf. and 
Immun 64:4795-4801 is marked. 

Figure 9 shows the nucleotide sequence of SL. enterica 
serogroup C2 0 antigen gene cluster. Note: 
(1) The numbering is as in Brown et al . 1992. "Molecular 
analysis of the rfb gene cluster of Salmonella serovar 
muenchen (strain M67): the genetic basis of the 
polymorphism between groups C2 and B" . Mol . Microbiol. 6: 
1385-1394(2) The first and last three bases of a gene are 
underlined and in italics respectively. (3) Only that part 
of the group C2 gene cluster, which differs from that of 
group B, was sequenced and is presented here. 

Figure 10 shows the nucleotide sequence of S_=- enterica 
serogroup B O antigen gene cluster Note: (1) The numbering is as 
in Jiang et al . 1991. "Structure and sequence of the rfb (O 
antigen) gene cluster of Salmonella serovar typhimurium (strain 
LT2)". Mol. Microbiol. 5: 695-713. The first gene in the O 
antigen gene cluster is rmlB which starts at base 4099. (2) The 
first and last three bases of a gene are underlined and in 
italics respectively. 



5 BEST METHOD FOR CARRYING OUT THE INVENTION 

Materials and Met hods -part 1 

The experimental procedures for the isolation and 
characterisation of the coli 0111 0 antigen gene 
cluster (position 3,021-9,981) are according to Bastin 
0 D.A., et al. 1991 "Molecular cloning and expression in 

Escherichia coli K-12 of the rfb gene cluster determining 
the 0 antigen of an coli Olll strain". Mol. Microbiol. 
5:9 2223-2231 and Bastin D.A. and Reeves, P.R. 1995 
"Sequence and analysis of the O antigen gene {rfb) cluster 
5 of Escherichia coli Olll". Gene 164: 17-23. 
A. Bacterial strains and growth media 

Bacteria were grown in Luria broth supplemented as 
required . 
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B . Cosmids and phage 

Cosmids in the host -strain x2819 were repackaged in 
vivo. Cells were grown in 2 50mL flasks containing 30mL of 
culture, with moderate shaking at 30°C to an optical 
density of 0.3 at 580 ran. The defective lambda prophage 
was induced by heating in a water bath at 45°C for 15min 
followed by an incubation at 37°C with vigorous shaking 
for 2hr. Cells were then lysed by the addition of 0 . 3mL 
chloroform and shaking for a further lOmin. Cell debris 
were removed from lmL of lysate by a 5min spin in a 
microcentrifuge, and the supernatant removed to a fresh 
microfuge tube. One drop of chloroform was added then 
shaken vigorously through the tube contents. 
C. DNA preparation 
i Chromosomal DNA was prepared from bacteria grown 

overnight at 37°C in a volume of 3 0mL of Luria broth. 
After harvesting by centrif ugation, cells were washed and 
resuspended in lOmL of 50mMTris-HCl pH 8.0. EDTA was 
added and the mixture incubated for 2 0min. Then lysozyme 
0 was added and incubation continued for a further lOmin. 
Proteinase K, SDS, and ribonuclease were then added and 
the mixture incubated for up to 2hr for lysis to occur. 
All incubations were at 37°C. The mixture was then heated 
to 65°C and extracted once with 8mL of phenol at the same 
5 temperature. The mixture was extracted once with 5mL of 
phenol /chloroform/ iso-amyl alcohol at 4°C. Residual 
phenol was removed by two ether extractions. DNA was 
precipitated with 2 vols, of ethanol at 4°C, spooled and 
washed in 70% ethanol, resuspended in l-2mL of TE and 
50 dialysed. Plasmid and cosmid DNA was prepared by a 

modification of the Birnboim and Doly method [Birnboim, H. 
C. And Doly, J. (1979) A rapid alkaline extraction 
procedure for screening recombinant plasmid DNA Nucl . Acid 
Res. 7:1513-1523. The volume of culture was lOmL and the 
35 lysate was extracted with phenol /chloroform/ iso-amyl 

alcohol before precipitation with isopropanol . Plasmid 
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DNA to be used as vector was isolated on a continuous 
caesium ch-loride gradient following alkaline lysis of 
cells grown in 1L of culture. 

D. Enzymes and buffers. 
Restriction endonucleases and DNA T4 ligase were 

purchased from Boehringer Mannheim (Castle Hill, NSW, 
Australia) or Pharmacia LKB (Melbourne, VIC Australia) . 
Restriction enzymes were used in the recommended 
commercial buffer. 

E. Construction of a gene bank. 
Individual aliquots of M92 chromosomal DNA (strain 

Stoke W, from Statens Serum Institut, 5 Artillerivej , 2300 
Copenhagen S, Denmark) were partially digested with 0 . 2U 
Sau3Al for l-15mins . Aliquots giving the greatest 
proportion of fragments in the size range of approximately 
40-50kb were selected and ligated to vector pPR691 
previously digested with BamHl and PvuII. Ligation 
mixtures were packaged in vitro with packaging extract. 
The host strain for transduction was x2819 and 
0 recombinants were selected with kanamycin. 

F. Serological procedures. 

Colonies were screened for the presence of the 0111 
antigen by immunoblotting . Colonies were grown overnight, 
up to 100 per plate then transferred to nitrocellulose 

5 discs and lysed with 0 . 5N HC1 . Tween 20 was added to TBS 
at 0.05% final concentration for blocking, incubating and 
washing steps. Primary antibody was EL_ coli O group 111 
antiserum, diluted 1:800. The secondary antibody was goat 
anti-rabbit IgG labelled with horseradish peroxidase 

0 diluted 1:5000. The staining substrate was 4-chloro-l- 
napthol. Slide agglutination was performed according to 
the standard procedure. 
G . Recombinant DNA methods . 

Restriction mapping was based on a combination of 

;5 standard methods including single and double digests and 
sub-cloning. Deletion derivatives of entire cosmids were 
produced as follows: aliquots of 1 . 8|xg of cosmid DNA were 
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digested in a volume of 2 0jul1 with 0.25U of restriction 
enzyme for~5-80min. One half of each aliquot was used to 
check the degree of digestion on an agarose gel. The 
sample which appeared to give a representative range of 
fragments was ligated at 4°C overnight and transformed by 
the CaCl 2 method into JM109. Selected plasmids were 
transformed into s<|>17 4 by the same method. P4657 was 
transformed with pPR1244 by electroporation . 
H. DNA hybridisation 

Probe DNA was extracted from agarose gels by 
electroelution and was nick- translated using [a-32P] -dCTP . 
Chromosomal or plasmid DNA was electrophoresed in 0.8% 
agarose and transferred to a nitrocellulose membrane. The 
hybridisation and pre-hybridisation buffers contained 
either 3 0% or 50% formamide for low and high stringency 
probing respectively. Incubation temperatures were 42°C 
and 37°C for pre-hybridisation and hybridisation 
respectively. Low stringency washing of filters consisted 
of 3 x 20min washes in 2 x SSC and 0.1% SDS . High- 
stringency washing consisted of 3 x 5min washes in 2 x SSC 
and 0.1% SDS at room temperature, a lhr wash in 1 x SSC 
and 0.1% SDS at 58°C and 15min wash in 0 . 1 x SSC and 0.1% 

SDS at 58°C. 

I. Nucleotide sequencing of coli Olll O antigen gene 
cluster (position 3,021-9,981) 

Nucleotide sequencing was performed using an ABI 3 73 
automated sequencer (CA, USA) . The region between map 
positions 3.3 0 and 7.90 was sequenced using 
uni-directional exonuclease III digestion of deletion 
families made in PT7T3190 from clones pPRl270 and pPR1272 . 
Gaps were filled largely by cloning of selected fragments 
into M13mpl8 or Ml3mpl9 . The region from map positions 
7.90-10.2 was sequenced from restriction fragments in 
M13mpl8 or Ml3mpl9 . Remaining gaps in both the regions 
were filled by priming from synthetic oligonucleotides 
complementary to determined positions along the sequence, 
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using a single stranded DNA template in M13 or phagemid. 
The oligonucleotides were designed after analysing the 
adjacent sequence. All sequencing was performed by the 
chain termination method. Sequences were aligned using SAP 
[Staden, R. , 1982 "Automation of the computer handling of 
gel reading data produced by the shotgun method of DNA 
sequencing". Nuc . Acid Res. 10: 4731-4751; Staden, R. , 
1986 "The current status and portability of our sequence 
handling software". Nuc . Acid Res. 14: 217-231], The 
program NIP [Staden, R. 1982 "An interactive graphics 
program for comparing and aligning nucleic acid and amino 
acid sequence". Nuc. Acid Res. 10: 2951-2961] was used to 
find open reading frames and translate them into proteins. 
J. Isolation of clones carrying EL_ coli 0111 0 antigen 
gene cluster 

The coli O antigen gene cluster was isolated 
according to the method of Bastin D.A. , et al . [1991 
"Molecular cloning and expression in Escherichia coli K-12 
of the rfb gene cluster determining the O antigen of an E. 
coli Olll strain". Mol . Microbiol. 5(9), 2223-2231]. 
Cosmid gene banks of M92 chromosomal DNA were established 
in the in vivo packaging strain x2819. From the genomic 
bank, 3.3 x 10 3 colonies were screened with EU. coli 0111 
antiserum using an immuno-blotting procedure: 5 colonies 
(PPR1054, pPR1055, pPR1056, pPRl058 and pPR1287) were 
positive. The cosmids from these strains were packaged in 
vivo into lambda particles and transduced into the E^ coli 
deletion mutant S<t>174 which lacks all 0 antigen genes. In 
this host strain, all plasmids gave positive agglutination 
with 0111 antiserum. An Eco Rl restriction map of the 5 
independent cosmids showed that they have a region of 
approximately 11.5 kb in common (Figure 1). Cosmid 
PPR1058 included sufficient flanking DNA to identify 
several chromosomal markers linked to O antigen gene 
cluster and was selected for analysis of the O antigen 
gene cluster region. 

K. Restriction mapping of cosmid pPRl058 
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Cosmid pPR1058 was mapped in two stages. A 
preliminary map was constructed first, and then the region 
between map positions 0.00 and 23.10 was mapped in detail, 
since it was shown to be sufficient for 0111 antigen 
5 expression. Restriction sites for both stages are shown 
in Figure 2. The region common to the five cosmid clones 
was between map positions 1.35 and 12.95 of pPR1058. 

To locate the 0 antigen gene cluster within pPRl058, 
pPR1058 cosmid was probed with DNA probes covering O 
10 antigen gene cluster flanking regions from enterica LT2 
and E. coli K-12 . Capsular polysaccharide (cps) genes lie 
upstream of 0 antigen gene cluster while the gluconate 
dehydrogenase (grid) gene and the histidine (his) operon 
are downstream, the latter being further from the 0 
15 antigen gene cluster. The probes used were pPR472 

(3.35kb), carrying the grid gene of LT2 , pPR685 (5.3kb) 
carrying two genes of the cps cluster, cpsB and cpsG of 
LT2, and K3 50 (16.5kb) carrying all of the his operon of 
K-12. Probes hybridised as follows: pPR472 hybridised to 
2 0 1.55kb and 3 . 5 kb (including 2.7 kb of vector) fragments 
of Pstl and Hindlll double digests of pPRl246 (a 
Hindlll/ScoRl subclone derived from pPRl058, Figure 2), 
which could be located at map positions 12.95-15.1; pPR685 
hybridised to a 4.4 kb EcoRl fragment of pPRl058 
25 (including 1.3 kb of vector) located at map position 0.00- 
3.05; and K350 hybridised with a 32kb EcoRl fragment of 
pPR1058 (including 4 . Okb of vector), located at map 
position 17.30-45.90. Subclones containing the presumed 
grid region complemented a gnd'edd' strain GB23152. On 
3 0 gluconate bromothymol blue plates, pPRl244 and pPR1292 in 
this host strain gave the green colonies expected of a 
gncTedd' genotype . The his* phenotype was restored by 
plasmid pPR1058 in the his deletion strain S<(>174 on 
minimal medium plates, showing that the plasmid carries 
3 5 the entire his operon. 

It is likely that the O antigen gene cluster region 
lies between gnd and cps, as in other E. coli and 
enterica strains, and hence between the approximate map 
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positions 3.05 and 12.95. To confirm this, deletion 
derivatives of pPR1058 were made as follows: first, 
PPR1058 was partially digested with Hindi I I and self 
ligated. Transf ormants were selected for kanamycin 
resistance and screened for expression of Olll antigen. 
Two colonies gave a positive reaction. EcoRl digestion 
showed that the two colonies hosted identical plasmids, 
one of which was designated pPRl230, with an insert which 
extended from map positions 0.00 to 23.10. Second pPR1058 
was digested with Sail and partially digested with Xhol 
and the compatible ends were re-ligated. Transf ormants 
were selected with kanamycin and screened for 0111 antigen 
expression. Plasmid DNA of 8 positively reacting clones 
was checked using EcoRl and Xhol digestion and appeared to 
be identical. The cosmid of one was designated pPR1231. 
The insert of pPR1231 contained the DNA region between map 
positions 0.00 and 15.10. Third, pPR1231 was partially 
digested with Xhol, self -ligated, and transf ormants 
selected on spectinomycin/ streptomycin plates. Clones 
were screened for kanamycin sensitivity and of 10 
selected, all had the DNA region from the Xhol site in the 
vector to the Xhol site at position 4.00 deleted. These 
clones did not express the Olll antigen, showing that the 
Xhol site at position 4.00 is within the O antigen gene 
cluster. One clone was selected and named pPR12 88. 
Plasmids pPR1230, pPR1231, and pPR1288 are shown in Figure 
2. 

L. Analysis of the E. coli Olll O antigen gene 

cluster (position 3,021-9,981) nucleotide sequence data 

Bastin and Reeves [199 5 "Sequence and analysis of the 
O antigen gene (rfb) cluster of Escherichia coli Olll". Gene 
164: 17-23] partially characterised the E^ coli Olll O 
antigen gene cluster by sequencing a fragment from map 
position 3,021-9,981. Figure 3 shows the gene 
organisation of position 3,021-9,981 of EL. coli Olll O 
antigen gene cluster. orf3 and orf6 have high level amino 
acid identity with wcslH and wcstG (46.3% and 37.2% 
respectively) , and are likely to be similar in function to 



WO 98/50531 ' PCT/AU98/00315 

- 32 - 

sugar biosynthetic pathway genes in the coli K-12 
colanic gene cluster. orf4 and orfS show high levels of 
amino acid homology to manC and manB genes respectively. 
orf7 shows high level homology with rfbH which is an 
abequose pathway gene. orf8 encodes a protein with 12 
transmembrane segments and has similarity in secondary 
structure to other wzx genes and is likely therefore to be 
the O antigen flippase gene. 

Materials and Methods -part 2 

A. Nucleotide sequencing of 1 to 3,020 and 9,982 to 
14,516 of the E^. coli Olll O antigen gene cluster 

The sub clones which contained novel nucleotide 
sequences, pPR1231 (map position 0 and 1,510), pPRl237 
(map position -300 to 2,744), pPR1239 (map position 2,744 
to 4,168), pPRl245 (map position 9,736 to 12,007) and 
PPR1246 (map position 12,007 to 15,300) (Figure 2), were 
characterised as follows : the distal ends of the inserts 
of pPR1237, pPRl239 and pPR1245 were sequenced using the 
M13 forward and reverse primers located in the vector. 
PCR walking was carried out to sequence further into each 
insert using primers based on the sequence data and the 
primers were tagged with M13 forward or reverse primer 
sequences for sequencing. This PCR walking procedure was 
repeated until the entire insert was sequenced. pPR1246 
was characterised from position 12,007 to 14,516. The DNA 
of these sub clones was sequenced in both directions. The 
sequencing reactions were performed using the dideoxy 
termination method and thermocycling and reaction products 
were analysed using fluorescent dye and an ABI automated 
sequencer (CA, USA) . 

B. Analysis of the coli 0111 0 antigen gene cluster 
(positions 1 to 3,020 and 9,982 to 14,516 of SEQ ID N0:1) 
nucleotide sequence data 

The gene organisation of regions of E^ coli Olll 0 
antigen gene cluster which were not characterised by 
Bastin and Reeves [1995 "Sequence and analysis of the O 
antigen gene {rfb) cluster of Escherichia coli Olll." Gene 
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164: 17-23] , (positions 1 to 3,020 and 9,982 to 14,516) is 
shown in Figure 3. There" are two open reading frames in 
region 1 . Four open reading frames are predicted in 
region 2. The position of each gene is listed in Table 5. 
5 The deduced amino acid sequence of orfl (wbdH) shares 

about 64% similarity with that of the rfp gene of Shigella 
dysenteriae . Rfp and WbdH have very similar 
hydrophobicity plots and both have a very convincing 
predicted transmembrane segment in a corresponding 
10 position. rfp is a galactosyl transferase involved in the 
synthesis of LPS core, thus wbdH is likely to be a 
galactosyl transferase gene. orf2 has 85.7% identity at 
amino acid level to the gmd gene identified in the coli 
K-12 colanic acid gene cluster and is likely to be a gmd 
15 gene. orf9 encodes a protein with 10 predicted 

transmembrane segments and a large cytoplasmic loop. 
This inner membrane topology is a characteristic feature 
of all known 0 antigen polymerases thus it is likely that 
orf9 encodes an 0 antigen polymerase gene, wzy. orflO 

2 0 (wbdL) has a deduced amino acid sequence with low homology 

with Lsi2 of Neisseria gonorrhoeae . Lsi2 is responsible 
for adding GlcNAc to galactose in the synthesis of 
lipooligosaccharide . Thus it is likely that wbdL is 
either a colitose or glucose transferase gene. orfll 
25 (wbdM) shares high level nucleotide and amino acid 

similarity with TrsE of Yersinia en t eroch ol itica. TrsE is 
a putative sugar transferase thus it is likely that wbdM 
encodes the colitose or glucose transferase. 

In summary three putative transferase genes and an 0 

3 0 antigen polymerase gene were identified at map position 1 

to 3,020 and 9,982 to 14,516 of coli Olll 0 antigen 
gene cluster. A search of GenBank has shown that there 
are no genes with significant similarity at the nucleotide 
sequence level for two of the three putative transferase 
3 5 genes or the polymerase gene. SEQ ID NO:l and Figure 7 
provide the nucleotide sequence of the Olll antigen gene 
cluster . 
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Materials and Me>thods-part 3, 

A PGR amplification of "0157 antigen gene cluster from 
an E_ coli 0157-.H7 strain (Strain C664-1992, from Statens 
Serum Institut, 5 Artillerivej , 2300, Copenhagen S, 

Denmark ) . . 

E_ coli 0157 0 antigen gene cluster was amplrf xed by 
using long PGR [Cheng et al . 1994, Effective amplification 
of long targets from cloned inserts and human and genomic 
DNA" P N.A.S. USA 91: 5695-569] with one primer (prxmer 
#412: att ggt age tgt aag cca agg gcg gta gcg t) based on 
the JumpStart sequence usually found in the promoter 
region of 0 antigen gene clusters [Hobbs, et al . 1994 "The 
JumpStart sequence: a 39 bp element common to several 
polysaccharide gene dusted" Mol. Microbiol. 12: 855-856], 
and another primer #482 (cac tgc cat acc gac gac gec gat 
ctg ttg ctt gg) based on the grid gene usually found 
downstream of the O antigen gene cluster. Long PGR was 
carried out using the Expand Long Template PCR System from 
Boehringer Mannheim (Castle Hill NSW Australia) , and 
products, 14 kb in length, from several reactions were 
combined and purified using the Promega Wizard PCR preps 
DNA purification System (Madison WI USA) . The PCR product 
was then extracted with phenol and twice with ether, 
precipitated with 70% ethanol, and resuspended in 40HL of 



25 water. 

B. Construction of a random DNase I bank 
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Two aliquots containing about 150ng of DNA each were 
subjected to DNase I digestion using the Novagen DNase I 
Shotgun Cleavage (Madison WI USA) with a modified protocol 
as described. Each aliquot was diluted into 45^1 of 0.05M 
Tris -HC1 ( P H7.5), 0.05mg/mL BSA and lOmM MnCl 2 . 5^lL of 
1:3000 or 1:4500 dilution of DNasel (Novagen) (Madison WI 
USA) in the same buffer was added into each tube 
respectively and lOfil of stop buffer (lOOmM EDTA) 30% 
glycerol, 0.5% Orange G, 0.075% xylene and cyanol 
(Novagen) (Madison WI USA) was added after incubation at 
15°C for 5 min. The DNA from the two DNasel reaction 
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tubes were then combined and fractionated on a 0.8% LMT 
agarose gel, and the gel segment with DNA of about Ikb in 
size (about 1 . 5mL agarose) was excised. DNA was extracted 
from agarose using Promega Wizard PCR Preps DNA 
5 Purification (Madison WI USA) and resuspended in 200 

water, before being extracted with phenol and twice with 
ether, and precipitated. The DNA was then resuspended in 
17.25 HL water and subjected to T4 DNA polymerase repair 
and single dA tailing using the Novagen Single dA Tailing 
10 Kit (Madison WI USA) . The reaction product (85p.l 
containing about 8ng DNA) was then extracted with 
chloroform: isoamyl alcohol (24:1) once and ligated to 3x 
10- 3 pmol pGEM-T (Promega) (Madison WI USA) in a total 
volume of lOOflL. Ligation was carried out overnight at 
15 4°C and the ligated DNA was precipitated and resuspended 
in 20J1L water before being electroporated into EL. coli 
strain JM109 and plated out on BCIG-IPTG plates to give a 
bank . 

C. Sequencing 

20 DNA templates from clones of the bank were prepared 

for sequencing using the 96-well format plasmid DNA 
miniprep kit from Advanced Genetic Technologies Corp 
(Gaithersburg MD USA) The inserts of these clones were 
sequenced from one or both ends using the standard M13 
25 sequencing primer sites located in the pGEM-T vector. 
Sequencing was carried out on an ABI377 automated 
sequencer (CA USA) as described above, after carrying out 
the sequencing reaction on an ABI Catalyst (CA USA) . 
Sequence gaps and areas of inadequate coverage were PCR 
30 amplified directly from 0157 chromosomal DNA using primers 
based on the already obtained sequencing data and 
sequenced using the standard Ml 3 sequencing primer sites 
attached to the PCR primers. 

D. Analysis of the E, coli 0157 O antigen gene cluster 
35 nucleotide sequence data 

Sequence data were processed and analysed using the 
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Staden programs [Staden, R. , 1982 "Automation of the 
computer handling of gel "reading data produced by the 
shotgun method' of DNA sequencing." Nuc. Acid Res. 10: 
4731-4751; Staden, R. , 1986 "The current status and 
portability of our sequence handling software". Nuc. Acid 
Res. 14: 217-231; Staden, R. 1982 "An interactive graphics 
program for comparing and aligning nucleic acid and amino 
acid sequence". Nuc. Acid Res . 10: 2951-2961]. Figure 4 
shows the structure of coli 0157 O antigen gene 
cluster. Twelve open reading frames were predicted from 
the sequence data, and the nucleotide and amino acid 
sequences of all these genes were then used to search the 
GenBank database for indication of possible function and 
specificity of these genes. The position of each gene is 
listed in Table 6. The nucleotide sequence is presented 
in SEQ ID NO : 2 and Figure 8. 

orfs 10 and 11 showed high level identity to manC and 
manB and were named manC and manB respectively. or £7 
showed 89% identity (at amino acid level) to the gmd gene 
of the EL coli colanic acid capsule gene cluster 
(Stevenson G., K. et al. 1996 "Organisation of the 
Escherichia coli K-12 gene cluster responsible for 
production of the extracellular polysaccharide colanic 
acid". J. Bacteriol. 178:4885-4893) and was named gmd. 
orf8 showed 79% and 69% identity (at amino acid level) 
respectively to wcaG of the EL coli colanic acid capsule 
gene cluster and to wbcJ (or f 14. 8) gene of the Yersinia 
enterocolitica 08 0 antigen gene cluster (Zhang, L. et al . 
1997 "Molecular and chemical characterization of the 
i lipopolysaccharide O-antigen and its role in the virulence 
of Y_s_ enterocolitica serotype 08".Mol. Microbiol. 23:63- 
76) . Colanic acid and the Yersinia 08 O antigen both 
contain fucose as does the 0157 O antigen. There are two 
enzymatic steps required for GDP-L-fucose synthesis from 
} GDP-4-keto-6-deoxy-D-mannose, the product of the gmd gene 
product. However, it has been shown recently (Tonetti, M 
et al. 1996 Synthesis of GDP-L-fucose by the human FX 
protein J. Biol. Chem. 271:27274-27279) that the human FX 
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protein has "significant homology" with the wcaG gene 
(referred to as Yefb in that paper) , and that the FX 
protein carries out both reactions to convert GDP-4-keto- 
6-deoxy-D-mannose to GDP-L-f ucose . We believe that this 
5 makes a very strong case for orf8 carrying out these two 

steps and propose to name the gene fcl. In support of the 
one enzyme carrying out both functions is the observation 
that there are no genes other than manB, manC, gmd and fcl 
with similar levels of similarity between the three 
10 bacterial gene clusters for fucose containing structures. 
orf-5 is very similar to wbeE (rfbE) of Vibrio 
cholerae 01, which is thought to be the perosamine 
synthetase, which converts GDP-4-keto-6-deoxy-D-mannose to 
GDP-perosamine (Stroeher, U.H et al . 1995 "A putative 
15 pathway for perosamine biosynthesis is the first function 
encoded within the rfb region of Vibrio cholerae" Ol . Gene 
166- 33-42) . Y_ rhnlerae Ol and ELu coU 0157 O antigens 
contain perosamine and N-acetyl-perosamine respectively. 
The V, cholerae 01 manA. manB, gmd and wbeE genes are the 
20 only genes of the V, cholerae Ol gene cluster with 

significant similarity to genes of the 1L_ coli 0157 gene 
cluster and we believe that our observations both confirm 
the prediction made for the function of wbe of 
cholerae, and show that orf5 of the 0157 gene cluster 
25 encodes GDP-perosamine synthetase. or£5 is therefore 

named per. orf5 plus about lOObp of the upstream region 
(postion 4022-5308)was previously sequenced by Bilge, S.S. 
et al [1996 «Role of the Escherichia coli 0157-H7 0 side 
chain in adherence and analysis of an rfb locus" . Infect . 

30 Immun. 64:4795-4801]. 

or£12 shows high level similarity to the conserved 
region of about 50 amino acids of various members of an 
acetyltransferase family (Lin, W. , et al . 1994 "Sequence 
analysis and molecular characterisation of genes required 
35 for the biosynthesis of type 1 capsular polysaccharide m 
S^aphvlosoccus aureus" - J • Bateriol . 176: 7005-7016) and 
we believe it is the N -acetyltransf erase to convert GDP- 
perosamine to GDP-perNAc . orf!2 has been named wbdR. 
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The genes manB, manC, gmd, fcl. per and wbdR account 
for all or the expected biosynthetic pathway genes of the 

0157 gene cluster. 

The remaining biosynthetic step(s) required are for 
5 synthesis of UDP-GalNAc from UDP-Glc . It has been 

proposed (Zhang, L. , et al. 1997 "Molecular and chemical 
characterisation of the lipopolysaccharide O-antigen and 
its role in the virulence of Yersinia o pterocolitica 
serotype 08».Mol. Microbiol. 23:63-76) that in Yersinia 
10 -t^ocolitica UDP-GalNAc is synthesised from UDP-GlcNAc 
by a homologue of galactose epimerase (GalE) , for which 
there is a galE like gene in the Yersinia enterocolitxca 
OS gene cluster. In the case of 0157 there is no galE 
homologue in the gene cluster and it is not clear how UDP- 
15 GalNAc is synthesised. It is possible that the galactose 
epimerase encoded by the galE gene in the gal operon, can 
carry out conversion of UDP-GlcNAc to UDP-GalNAc in 
addition to conversion of UDP-Glc to UDP-Gal . There do 
not appear to be any gene(s) responsible for UDP-GalNAc 
20 synthesis in the 0157 gene cluster. 

orf4 shows similarity to many wzx genes and is named 
wzx and orf2 which shows similarity of secondary structure 
in the predicted protein to other wzy genes and is for 
that reason named wzy. 
25 The orfl, orf3 and orf6 gene products all have 

characteristics of transferases, and have been named wbdN, 
wbdO and wbdP respectively. The 0157 0 antigen has 4 
sugars and 4 transferases are expected. The first 
transferase to act would put a sugar phosphate onto 
3 0 undecaprenol phosphate. The two transferases known to 

perform this function, WbaP (RfbP) and WecA (Rfe) transfer 
galactose phosphate and N-acetyl-glucosamine phosphate 
respectively to undecaprenol phosphate. Neither of these 
sugars is present in the 0157 structure. 
35 Further, none of the presumptive transferases in the 

0157 gene cluster has the transmembrane segments found in 
WecA and WbaP which transfer a sugar phosphate to 
undecaprenol phosphate and expected for any protein which 
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transferred a sugar to undecaprenol phosphate which is 
embedded within the membrane. 

The WecA gene which transfers GlcNAc-P to 
undecaprenol phosphate is located in the Enterobactereal 
Common Antigen (ECA) gene cluster and it functions in ECA 
synthesis in most and perhaps all E. coli strains, and 
also in O antigen synthesis for those strains which have 
GlcNAc as the first sugar in the O unit. 

It appears that WecA acts as the transferase for 
addition of GalNAc-l-P to undecaprenol phosphate for the 
Yersinia ^tPro^olitica 08 O antigen [Zhang et al.1997 
"Molecular and chemical characterisation of the 
lipopolysaccharide 0 antigen and its role in the virulence 
of Yersinia P^P.rocolitica serotype 08" Mol . Microbiol, 
i 23: 63-76.] and perhaps does so here as the 0157 structure 
includes GalNAc . WecA has also been reported to add 
Glucose- 1-P phosphate to undecaprenol phosphate in E_ coli 
08 and 09 strains, and an alternative possibility for 
transfer of the first sugar to undecaprenol phosphate is 
0 WecA mediated transfer of glucose, as there is a glucose 
residue in the 0157 0 antigen. In either case the 
requisite number of transferase genes are present if 
GalNAc or Glc is transferred by WecA and the side chain 
Glc is transferred by a transferase outside of the O 
5 antigen gene cluster. 

orf9 shows high level similarity (44% identity at 
amino acid level, same length) with wcaH gene of the E^ 
coli colanic acid capsule gene cluster. The function of 
this gene is unknown, and we give or £9 the name whdQ. 
30 T he DNA between manB and wdbR has strong sequence 

similarity to one of the H-repeat units of L coli K12 . 
Both of the inverted repeat sequences flanking this region 
are still recognisable, each with two of the 11 bases 
being changed. The H-repeat associated protein encoding 
35 gene located within this region has a 267 base deletion 

and mutations in various positions. It seems that the H- 
repeat unit has been associated with this gene cluster for 
a long period of time since it translocated to the gene 
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cluster, perhaps playing a role in assembly of the gene 
cluster as" has been proposed in other cases. 

Materials and Metho ds - part 4 

To test our hypothesis that O antigen genes for 
transferases and the wzx, wzy genes were more specific 
than pathway genes for diagnostic PCR, we first carried 
out PCR using primers for all the E_ coli 016 0 antigen 
genes (Table 4) . The PCR was then carried out using PCR 
primers for coli 0111 transferase, wzx and wzy genes 
(Table 5, 5A) . PCR was also carried out using PCR primers 
for the coli 0157 transferase, wzx and wzy genes (Table 
6 , 6A) . 

Chromosomal DNA from the 166 serotypes of EL. coli 
available from Statens Serum Institut, 5 Artillerivej , 
23 00 Copenhagen Denmark was isolated using the Promega 
Genomic (Madison WI USA) isolation kit. Note that 164 of 
the serogroups are described by Ewing W. H. : Edwards and 
Ewings "Identification of the Enterobacteriacea" Elsevier, 
0 Amsterdam 1986 and that they are numbered 1-171 with 

numbers 31, 47, 67, 72, 93, 94 and 122 no longer valid. 
Of the two serogroup 19 strains we used 19ab strain F8188- 
41. Lior H. 1994 ["Classification of Kschericia coli In 
Kschericia coli in domestic animals and humans pp 31-72. 
5 Edited by C.L. Gyles CAB international] adds two more 
numbered 172 and 173 to give the 166 serogroups used. 
Pools containing 5 to 8 samples of DNA per pool were made. 
Pool numbers 1 to 19 (Table 1) were used in the E. coli 
0111 and 0157 assay. Pool numbers 20 to 28 were also used 
$0 in the 0111 assay, and pool numbers 22 to 24 contained E. 
coli 0111 DNA and were used as positive controls (Table 
2) . Pool numbers 29 to 42 were also used in the 0157 
assay, and pool numbers 31 to 36 contained E. coli 0157 
DNA, and were used as positive controls (Table 3) . Pool 
35 numbers 2 to 20, 30, 43 and 44 were used in the coli 
016 assay (Tables 1 to 3). Pool number 44 contained DNA 
of E_^ coli K-12 strains C600 and WG1 and was used as a 
positive control as between them they have all of the E^ 
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C oli k-12 016 O antigen genes. 

PGR reactions were carried out under the following 
conditions: denaturing 94°C/30»; annealing, temperature 
varies (refer to Tables 4 to 8)/30"; extension, 72C/1'; 
30 cycles. PGR reaction was carried out in an volume of 
25UL for each pool. After the PGR reaction, IOjiL PCR 
product from each pool was run on an agarose gel to check 

for amplified DNA. 

Each E_ coli and £U pnf.erica chromosomal DNA sample 
was checked by gel electrophoresis for the presence of 
chromosomal DNA and by PCR amplification of the coir or 
S_enterica^mdh gene using oligonucleotides based on SU 
coli K-12 or "~ Pnr. e rica LT2 [Boyd et al . (1994) 

molecular genetic basis of allelic polymorphism in malate 
degydrogenase (mdh) in natural populations of Eschench!* 
coli and m imonella_Bnterica" Proc . Nat. Acad. Sex. USA. 
91-1280-1284.] Chromosomal DNA samples from other 
bacteria were only checked by gel electrophoresis of 
chromosomal DNA. 

A. Primers based on E, coli 016 O antigen gene cluster 
sequence . 

The O antigen gene cluster of coli 016 was the 
only typical E, coli O antigen gene cluster that had been 
fully sequenced prior to that of Olll, and we chose it for 
testing our hypothesis. One pair of primers for each gene 
was tested against pools 2 to 20, 3 0 and 43 of IL_ coli 
chromosomal DNA. The primers, annealing temperatures and 
functional information for each gene are listed in Table 

For the five pathway genes, there were 17/21, 13/21, 
0/21, 0/21, 0/21 positive pools for rmlB, rmlD. rmlA, rmlC 
and glf respectively (Table 4) . For the wzx. wzy and 
three transferase genes there were no positives amongst 
the 21 pools of Ej. coli chromosomal DNA tested (Table 4) . 
in each case the #44 pool gave a positive result. 
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Primers based on the E, coli 0111 O antigen gene 



clsuter sequence. 

One to four pairs of primers for each of the 
transferase, wzx and wzy genes of 0111 were tested against 
the pools 1 to 21 of E, coli chromosomal DNA (Table 5) . 
For wbdH, four pairs of primers, which bind to various 
regions of this gene, were tested and found to be specific 
for Olll as there was no amplified DNA of the correct size 
in any of those 21 pools of IL. coli chromosomal DNA 
tested. Three pairs of primers for wbdM were tested, and 
they are all specific although primers #985/#986 produced 
a band of the wrong size from one pool. Three pairs of 
primers for wzx were tested and they all were specific. 
Two pairs of primers were tested for wzy, both are 
specific although #980/#983 gave a band of the wrong size 
in all pools. One pair of primers for wbdL was tested and 
found unspecific and therefore no further test was carried 
out Thus, wzx, wzy and two of the three transferase 
genes are highly specific to Olll. Bands of the wrong 
size found in amplified DNA are assumed to be due to 
chance hybridisation of genes widely present in coli. 
The primers, annealing temperatures and positions for each 

gene are in (Table 5) . 

The 0111 assay was also performed using pools 
25 including DNA from 0 antigen expressing Yersinia 

r^M^iosiB, shioella fes^ii and • qalmonella 

enterica strains (Table 5A) - None of the oligonucleotides 
derived from wbdH, wzx, wzy or wbdM gave amplified DNA of 
the correct size with these pools. Notably, pool number 
25 includes S. enterica Adelaide which has the same O 
antigen as E . coli 0111: this pool did not give a positive 
PGR result for any primers tested indicating that these 
genes are highly specific for E. coli 0111. 

Each of the 12 pairs binding to wbdH, wzx, wzy and 
wbdM produces a band of predicted size with the pools 
containing 0111 DNA (pools number 22 to 24) . As pools 22 
to 24 included DNA from all strains present in pool 21 
plus 0111 strain DNA (Table 2), we conclude that the 12 
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pairs of primers all give a positive PGR test with each of 
three unrelated 0111 strains but not with any other 
strains tested. Thus these genes are highly specific for 
E. coli 0111. 

C. Primers based on the EL. coli 0157 O antigen gene 
cluster sequence. 

Two or three primer pairs for each of the 
transferase, wzx and wzy genes of 0157 were tested against 
EL coli chromosomal DNA of pools 1 to 19, 29 and 30 (Table 
6) . For wbdN, three pairs of primers, which bind to 
various regions of this gene, were tested and found to be 
specific for 0157 as there was no amplified DNA in any of 
those 21 pools of coli chromosomal DNA tested. Three 
pairs of primers for wbdO were tested, and they are all 
specific although primers # 1211/#1212 produced two or 
three bands of the wrong size from all pools. Three pairs 
of primers were tested for wbdP and they all were 
specific. Two pairs of primers were tested for wbdR and 
they were all specific. For wzy, three pairs of primers 
were tested and all were specific although primer pair 
#1203/#1204 produced one or three bands of the wrong size 
in each pool. For wzx, two pairs of primers were tested 
and both were specific although primer pair #1217/#1218 
25 produced 2 bands of wrong size in 2 pools, and 1 band of 
wrong size in 7 pools. Bands of the wrong size found in 
amplified DNA are assumed to be due to chance 
hybridisation of genes widely present in E_ coli. The 
primers, annealing temperatures and function information 
for each gene are in Table 6. 

The 0157 assay was also performed using pools 37 to 
42, including DNA from 0 antigen expressing Yersinia 
pseudotuberculosis . Shigella , boydii. Yersinia 
..tPrnrnlitica 09, Brucella abortus and Salmonella 
gnterica strains (Table 6A) . None of the oligonucleotides 
derived from wbdN, wzy, wbdO, wzx, wbdP or wbdR reacted 
specifically with these pools, except that primer pair 
#1203/#1204 produced two bands with Y_ ^terocoli tica 09 
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and one of the bands is of the same size with that from 
the positive control. Primer pair #1203/#1204 binds to 
wzy . The predicted secondary structures of Wzy protexns 
are generally similar, although there is very low 
5 similarity at amino acid or DNA level among the sequenced 
WZ y genes. Thus, it is possible that Y, pnt^rol colitica 
09 has a wzy gene closely related to that of E, coli 0157 . 
It is also possible that this band is due to chance 
hybridization of another gene, as the other two wzy primer 
10 pairs (#1205/#1206 and #1207/#1208) did not produce any 
band with -~i-~nr.nl itica 09. Notably, pool number 37 
includes S_, enterica Landau which has the same 0 antigen 
as EL. coli 0157, and pool 38 and 39 contain DNA of B_, 
abortus and enterocolitica 09 which cross react 
15 serologically with E, coli 0157 . This result indicates 
that these genes are highly 0157 specific, although one 
primer pair may have cross reacted with Y. .nterocolitxca 
09 . 

Each of the 16 pairs binding to wbdN, wzx, wzy, wbdO, 
20 wbdP and wbdR produces a band of predicted size with the 
pools containing 0157 DNA (pools number 31 to 36). As 
pool 29 included DNA from all strains present in pools 31 
to 36 other than 0157 strain DNA (Table 3 ) , we conclude 
that the 16 pairs of primers all give a positive PCR test 
25 with each of the five unrelated 0157 strains. 

Thus PCR using primers based on genes wbdN, wzy, 
wbdO, wzx, wbdP and wbdR is highly specific for EL. 
0157, giving positive results with each of six unrelated 
0157 strains while only one primer pair gave a band of the 
30 expected size with one of three strains with O antigens 
known to cross-react serologically with E, coil 0157 . 

D. Primers based on the q»n morula enterica serotype C2 
and B 0 antigen gene cluster sequences. 
35 We also performed a PCR using primers for the 

enterica_C2 and B serogroup transferases, wzx, wzy and 
genes (Tables 7 to 9) . The nucleotide sequences of C2 
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and B O antigen gene clusters are listed as SEQ ID NO : 3 
(Fig 9) arid SEQ ID NO:4 (Fig. 10 ) respectively . 
Chromosomal DNA from all the 46 serotypes of Salmonella 
enterica (Table 9) was isolated using the Promega Genome 
isolation kit, 7 pools of 4 to 8 samples per pool were 
made. Salm^nella.enterica serotype B or C2 DNA was 
omitted from the pool for testing primers of 46 respective 
serotypes but added to a pool containing 6 other samples 
to give pool number 8 for use as a positive control. 

PGR reactions were carried out under the following 
conditions: denaturing, 94°C/30<<; annealing, temperature 
varies (see below) /30<<; extension, 72°C/1<; 30 cycles. 
PGR reaction was carried out in a volume of 25HL for each 
pool. After the PGR reaction, lO^L PGR product from each 
pool was run on an agarose gel to check for amplified DNA. 
For pools which gave a band of correct size, PCR was 
repeated using individual chromosomal samples of that 
pool, and agarose gel was run to check for amplified DNA 

from each sample . 

The ^terica serotype B O antigen gene 

cluster (of strain LT2) was the first O antigen gene 
cluster to be fully sequenced, and the function of each 
gene has been identified experimentally [Jiang, X. M. , 
Neal, B., Santiago, F., Lee, S. J.. Romana, L. K. . and 
Reeves P R- (1991) "Structure and sequence of the rfb (O 
antigen) gene cluster of Salmonella serovar typhimurium 
(strain LT2 ) . " Mol . Microbiol. 5(3), 695-713; Liu, D. , 
Cole, R., and Reeves, P. R. (1996). »An O antigen 
processing function for Wzx(RfbX): a promising candidate 

• ■ t R^rfPriol 178(7) ,2102-2107; Liu, 

for O-unit flippase" J. Bacteriox . , x#°v >• 

D , Haase, A. M. , Lindqvist, L . , Lindberg, A. A., and 

Reeves, P. R. (1993). "Glycosyl transferases of O-antigen 

biosynthesis in S. enteric* : identification and 

characterisation of transferase genes of groups B, C2 and 

35 El." J- Bacterid.. 175, 3408-3413; Liu, D., Lindquist, 

L and Reeves P- R. (1995) . "Transferases of O-antigen 

biosynthesis in Salmonella enteric*: dideoxhexosyl 



30 



10 



15 



20 



_ PCT/AU98/00315 
WO 98/50531 

46 

transferases of groups B and C2 and acetyltransf erase of 
group C2 " J . Bacteriol . / 177, 4084-4088; Romana, L. K., 
Santiago, F . S.. and Reeves , P. R. (1991). "High level 
expression and purification dThymidine-diphospho-D-glucose 
4,6 dehydratase (rfbB) from Salmonella serovar typhimunum 
LT2 - BBRC , 174, 846-852]. One pair of primers for each 
of the pathway genes and wbaP was tested against the pools 
of o»i™™-n» pnterica DNA, two to three pairs of pnmers 
for each of the other transferases and wzx genes were also 
tested. See Table 8 for a list of primers and functional 
information of each gene, as well as the annealing 
temperature of the PGR reaction for each pair of primers. 

For pathway genes of group B strain LT2 , there are 
19/45, 14/45, 15/45, 12/45, 6/45, 6/45, 6/45, 6/45, 1/45, 
9/45, 8/45 positives for rmlB. rmlD. rmlA. rmlC. ddhD, 
ddhA, ddhB, ddhC, ate, manC, and manB repsectively (Table 

For the LT2 wzx gene we used three primer pairs each 
of which gave 1/45 positive. For the 4 transferase genes 
we used a total of 9 primer pairs . 2 primer pairs for 
wbaV gave 2/90 positives. For 3 primer pairs of wbaN, 
11/135 gave a positive result. For the wbaP primer pair 
10/45 gave a positive result (Table 9). 

The experimental data show that oligonucleotides 
derived from the wzx and wbaV group B O antigen genes are 
specific for group B 0 antigen amongst all 45 Salmonella 
gnterica O antigen groups except O group 67. The 
oligonucleotides derived from ^Imonella enterica B group 
wbaN and wbaU genes detected B group O antigen and also 
produced positive results with groups A, Dl and D3 . WbaU 
encodes a transferase for a Mannose a (1-4) Mannose linkage 
and is expressed in groups A, B and Dl while wbaN, which 
encodes a transferase for Rhamnose a(l-3) Galactose 
linkage is present in groups A, B, Dl, D2, D3 and El. 
This accounts for the positive results with the group B 
wbaU and wbaN genes. The wbaN gene of groups E and D2 has 
considerable sequence differences from that of groups A, 
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B, Dl and D3 and this accounts for the positive results 
only with groups B, Dl and D3 . 

The s^ltnnnfilla enterica B primers derived from wzx 
and transferase genes produced a positive result with 
salmonella enterica 067. We find that salmonella enterica 
067 has all the genes of the group B O antigen cluster. 
There are several possible explanations for this finding 
including the possibility that the gene cluster is not 
functional due to mutation and the group 067 antigenicity 
is due to another antigen, or the 0 antigen is modified 
after synthesis such that its antigenicity is changed. 
Salmonella enterica 067 would therefore be scored as 
salmonella enterica group B in the PCR diagnostic assay. 
However, this is of little importance because Salmonella 
5 enterica 067 is a rare O antigen and only one (serovar 
Crossness) of the 2324 known serovars has the 067 
serotype [Popoff M.Y. et al (1992) "Antigenic formulas of 
the salmonella enterica serovars" 6th revision WHO 
Collaborating Centre for Reference and Research on 
0 salmonella enterica , Institut Pasteur Paris France], and 
serovar Crossness had only been isolated once [M. Popoff, 
personal communication] . 

The salmonella enterica B primers derived from wbaP 
reacted with group A, C2 , Dl , D2 , D3 , El, 54, 55, 67 and 
5 E4 0 antigen groups. WbaP encodes the galactosyl 

transferase which initiates 0 unit synthesis by transfer 
of Galactose phosphate to the lipid carrier Undecaprenol 
phosphate. This reaction is common to the synthesis of 
several 0 antigens. As such wbaP is distinguished from 
J0 other transferases of the invention as it does not make a 
linkage within an 0 antigen. 

We also tested 20 primer pairs for the wzx, wzy and 5 
transferase genes of serotype C2 and found no positives in 
all the 7 pools (Table 7) . 
35 Groups A, B, Dl , D2 , D3 , C2 and El share many genes 

in common. Some of these genes occur with more than one 
sequence in which case each specific sequence can be named 
after one of the serogroups in which it occurs. The 
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distribution of these sequence specificities is shown in 
Table 10. "The inventors have aligned the nucleotide 
sequences of Salmonella p.nt.erica wzy, wzx genes and 
transferase genes so as to determine specific combinations 
of nucleic acid molecules which can be employed to 
specifically detect and identify the Salmonella enterica 
groups A, B, Dl, D2 , D3 , C2 and El (Table 10). The 
results show that many of the O antigen groups can be 
detected and identified using a single specific nucleic 
acid molecule although other groups in particular D2 and 
El, and A and Dl require a panel of nucleic acid molecules 
derived from a combination of genes. 

It will be understood that in carrying out the 
methods of the invention with respect to the testing of 
particular sample types including samples from food, 
patients and faeces the samples are prepared by routine 
techniques routinely used in the preparation of such 
samples for DNA based testing. 
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TABLE 1 



Pool 
No. 



Strains of which chromosonal DNA included in the pool 



Source* 



1 E. coli type strains for O serotypes 1, 2, 3, 4, 10, 16, 18 and 39 

2 E. coli type strains for O serotypes 40, 41, 48, 49, 71, 73, 88 and 100 

3 E. coli type strains for O serotypes 102, 109, 119, 120, 121, 125, 126 and 
137 

4 E. coli type strains for O serotypes 138, 139, 149, 7, 5, 6, 11 and 12 

5 E. coli type strains for O serotypes 13, 14, 15, 17, 19ab, 20, 21 and 22 

6 E. coli type strains for O serotypes 23, 24, 25, 26, 27, 28, 29 and 30 

7 E. coli type strains for O serotypes 32, 33, 34, 35, 36, 37, 38 and 42 

8 E. coli type strains for O serotypes 43, 44, 45, 46, 50, 51, 52 and 53 

9 E. coli type strains for O serotypes 54, 55, 56, 57, 58, 59, 60 and 61 

10 E. coli type strains for O serotypes 62, 63, 64, 65, 66, 68, 69 and 70 

11 E. coli type strains for O serotypes 74, 75, 76, 77, 78, 79, 80 and 81 

12 E. coli type strains for O serotypes 82, 83, 84, 85, 86, 87, 89 and 90 

13 E. coli type strains for O serotypes 91, 92, 95 , 96, 97 , 98, 99 and 101 

14 E. coli type strains for O serotypes 103, 104, 105, 106, 107, 108 and 110 

15 E. coli type strains for O serotypes 112, 162, 113, 114, 115, 116, 117 and 
118 

16 E. coli type strains for O serotypes 123, 165, 166, 167, 168, 169, 170 and 
171 

17 E. coli type strains for O serotypes 172, 173, 127, 128, 129, 130, 131 and 
132 

18 E. coli type strains for O serotypes 133, 134, 135, 136, 140, 141, 142 and 
143 

19 E. coli type strains for O serotypes 144, 145, 146, 147, 148, 150, 151 and 
152 



IMVS a 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

IMVS 

See b 

See c 

IMVS 

IMVS 



a. Institute of Medical and Veterinary Science, Adelaide, Australia 

b. 123 from IMVS; the rest from Statens Serum Institut, Copenhagen, Denmark 

c. 172 and 173 from Statens Serum Institut, Copenhagen, Denmark, the rest from IMVS 
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TABLE 2 



Pool 

INO. 


Strains of which chromosonal DNA included in the pool 


Source* 


2U 


j? ™/f nmo ctrainc for O serntvnes 153 154 155, 156, 157, 158 , 159 and 
Zi. coll type strains ior w aciuiy^ca ± , x-*^, * , * 


IMVS 




160 




21 


a?/i type strains for O serotypes 161, 163, 164, 8, 9 and IZ4 


IMVS 


22 


As pool #21, plus £. co/r 0111 type strain Stoke W. 


IMVS 


23 


As pool #21, plus E. coli 0111:H2 strain C1250-1991 


See d 


24 


As pool #21, plus E, coli 0111:H12 strain C156-1989 


See e 




As nnol #^1 dIus 5 enterica serovar Adelaide 


See f 


26 


y. pseudotuberculosis strains of O groups IA, IIA, IIB, 1IC, III, IVA, IVB, 


See g 




VA, VB, VI and VII 




27 


5. boydii strains of serogroups 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 14 and 15 


See h 


28 


S. enterica strains of serovars (each representing a different O group) Typhi, 


IMVS 




Montevideo, Ferruch, Jangwani, Raus, Hvittingfoss, Waycross, Dan, 






Dugbe, Basel, 65,:i:e,n,z,15 and 52:d:e,n,x,zl5 




* 

d. 
e. 
f. 


C1250-1991 from Statens Serum Institut, Copenhagen, Denmark 
C156-1989 from Statens Serum Institut, Copenhagen, Denmark 
S. enterica serovar Adelaide from IMVS 





g, Dr S Aleksic of Institute of Hygiene, Germany 

h. Dr J Lefebvre of Bacterial Identification Section, Laboratoroie de Sante Pubhque du 
Quebec, Canada 
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Pool 
No. 



Strains of which chromosonal DNA included in the pool 



Source* 



29 

30 

31 

32 

33 

34 

35 
36 
37 
38 
39 
40 

41 
42 

43 
44 



E. coli type strains for O serotypes 153, 154, 155, 156, 158, 159 and 160 " 
E. coli type strains for O serotypes 161. 163, 164, 8, 9, 111 and 124 
As pool #29, plus E. coli 0157 type strain A2 (0157 .H19) 
As pool #29, plus E. coli 0157:H16 strain C475-89 

As pool #29, plus E. coli 0157:H45 strain C727-89 

As pool #29, plus E. coli 0157:H2 strain C252-94 

As pool #29, plus E. coli 0157.H39 strain C258-94 

As pool #29, plus E. coli 0157:H26 

As pool #29, plus S. enterica serovar Landau 

As pool #29, plus Brucella abortus 

As pool #29, plus Y. enterocolitica 09 

Y. pseudotuberculosis strains of O groups IA, HA, IIB, HC, III, IVA, IVB, VA, 
VB, VI and VII 

S. boydii strains of serogroups 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 14 and 15 

S enterica strains of serovars (each representing a different O group) Typhi, 

Montevideo, Ferruch, Jangwani, Raus, Hvittingfoss, Waycross, Dan, Dugbe, 

Basel, 65:i:e,n,zl5 and 52:d:e,n,x,zl5 

E. coli type strains for O serotypes 1,2,3,4,10,18 and 29 

As pool #43, plus E. coli K-12 strains C600 and WG1 



IMVS 

IMVS 

IMVS 

See d 

See d 

See d 

See d 

See e 

See f 

See g 
See h 

See i 

See j 
IMVS 

IMVS 

IVMS 
See k 



d. 0157 strains from Statens Serum Institut, Copenhagen, Denmark 



e. 
f. 

g- 
h. 

i. 
J. 

k. 



0157 H26 from Dr R Brown of Royal Children's Hospital, Melbourne, Victoria 
S enterica serovar Landau from Dr M Poppoff of Institut Pasteur, Paris, France 
B Abortus from the culture collection of The University of Sydney, Sydney, Australia 
Y. enterocolitica 09 from Dr. K. Bettelheim of Victorian Infectious Diseases Reference 
Laboratory Victoria, Australia. 
Dr S Aleksic of Institute of Hygiene, Germany 

Dr J Lefebvre of Bacterial Identification Section, Laboratoroie de Sante Publique du 
Quebec, Canada 

Strains C600 and WG1 from Dr. B.J. Backmann of Department of Biology, Yale 
University, USA. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Reeves, Peter R 
Wang, Lei 

(ii) TITLE OF INVENTION: Nucleic Acid Molecules Specific For 
Bacterial Antigens And Uses Thereof 



(iii) NUMBER OF SEQUENCES: 4 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Thomas Gumley 

(B) STREET: 168 Walker Street 

(C) CITY: North Sydney 

(D) STATE: New South Wales 

(E) COUNTRY: Australia 

(F) ZIP : 2068 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.3 0 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY / AGENT INFORMATION: 
(A) NAME: Gumley, Thomas P 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 99575944 

(B) TELEFAX: 99576288 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14516 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: YES 

(v) ORIGINAL SOURCE: 

(A) ORGANISM: Escherichia colx 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
GATCTGATGG CCGTAGGGCG CTACGTGCTT TCTGCTGATA TCTGGGCTGA GTTGGAAAAA 60 
ACTGCTCCAG GTGCCTGGGG ACGTATTCAA CTGACTGATG CTATTGCAGA GTTGGCTAAA 
AAACAGTCTG TTGATGCCAT GCTGATGACC GGCGACAGCT ACGACTGCGG TAAGAAGATG 
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GGCTATATGC AGGCATTCGT TAAGTATGGG CTGCGCAACC TTAAAGAAGG GGCGAAGTTC 2 40 

CGTAAGAGCA TCAAGAAGCT ACTGAGTGAG TAGAGATTTA CACGTCTTTG TGACGATAAG 3 00 

CCAGAAAAAA TAGCGGCAGT TAACATCCAG GCTTCTATGC TTTAAGCAAT GGAATGTTAC 360 

TGCCGTTTTT TATGAAAAAT GACCAATAAT AACAAGTTAA CCTACCAAGT TTAATCTGCT 42 0 

TTTTGTTGGA TTTTTTCTTG TTTCTGGTCG CATTTGGTAA GACAATTAGC GTGAGTTTTA 480 

GAGAGTTTTG CGGGATCTCG CGGAACTGCT CACATCTTTG GCATTTAGTT AGTGCACTGG 540 

TAGCTGTTAA GCCAGGGGCG GTAGCTTGCC TAATTAATTT TTAACGTATA CATTTATTCT 6 00 

TGCCGCTTAT AGCAAATAAA GTCAATCGGA TTAAACTTCT TTTCCATTAG GTAAAAGAGT 660 

GTTTGTAGTC GCTCAGGGAA ATTGGTTTTG GTAGTAGTAC TTTTCAAATT ATCCATTTTC 72 0 

CGATTTAGAT GG CAGTTGAT GTTACTATGC TGCATACATA TCAATGTATA TTATTTACTT 780 

TTAGAATGTG ATATGAAAAA AATAGTGATC ATAGGCAATG TAGCGTCAAT GATGTTAAGG 84 0 

TTCAGGAAAG AATTAATCAT GAATTTAGTG AGGCAAGGTG ATAATGTATA TTGTCTAGCA 900 

AATGATTTTT CCACTGAAGA TCTTAAAGTA CTTTCGTCAT GGGGCGTTAA GGGGGTTAAA 960 

TTCTCTCTTA ACTCAAAGGG TATTAATCCT TTTAAGGATA TAATTGCTGT TTATGAACTA 102 0 

AAAAAAATTC TTAAGGATAT TTCCCCAGAT ATTGTATTTT CATATTTTGT AAAG CCAGTA 108 0 

ATATTTGGAA CTATTGCTTC AAAGTTGTCA AAAGTGCCAA GGATTGTTGG AATGATTGAA 114 0 

GGTCTAGGTA ATGCCTTCAC TTATTATAAG GGAAAGCAGA CCACAAAAAC TAAAATGATA 1200 

AAGTGGATAC AAATTCTTTT ATATAAGTTA GCATTACCGA TGCTTGATGA TTTGATTCTA 126 0 

TTAAATCATG ATGATAAAAA AGATTTAATC GATCAGTATA ATATTAAAGC TAAGGTAACA 132 0 

GTGTTAGGTG GGATTGGATT GGATCTTAAT GAGTTTTCAT ATAAAGAGCC ACCGAAAGAG 13 8 0 

AAAATTACCT TTATTTTTAT AGCAAGGTTA TTAAGAGAGA AAGGGATATT TGAGTTTATT 144 0 

GAAGCCGCAA AGTTCGTTAA GACAACTTAT CCAAGTTCTG AATTTGTAAT TTTAGGAGGT 1500 

TTTGAGAGTA ATAATCCTTT CTCATTACAA AAAAATGAAA TTGAATCGCT AAGAAAAGAA 156 0 

CATGATCTTA TTTATCCTGG TCATGTGGAA AATGTTCAAG ATTGGTTAGA GAAAAGTTCT 1620 

GTTTTTGTTT TACCTACATC ATATCGAGAA GGCGTACCAA GGGTGATCCA AGAAGCTATG 16 8 0 

GCTATTGGTA GAC CTGTAAT AACAACTAAT GTACCTGGGT GTAGGGATAT AATAAATGAT 174 0 

GGGGTCAATG GCTTTTTGAT ACCTCCATTT GAAATTAATT TACTGGCAGA AAAAATGAAA 1800 

TATTTTATTG AGAATAAAGA TAAAGTACTC GAAATGGGGC TTGCTGGAAG GAAGTTTGCA 186 0 

GAAAAAAACT TTGATGCTTT TGAAAAAAAT AATAGACTAG CATCAATAAT AAAATCAAAT 1920 

AATGATTTTT GACTTGAGCA GAAATTATTT ATATTTCAAT CTGAAAAATA AAGGCTGTTA 1980 

TTATGAATAA AGTGGCATTA ATTACTGGTA TCACTGGGCA AGATGGCTCC TATTTGGCAG 204 0 

AATTATTGTT AGAAAAAGGT TATGAAGTTC ATGGTATTAA ACGCCGTGCA TCTTCATTTA 2100 

ATACTGAGCG AGTGGATCAC ATCTATCAGG ATTCACATTT AGCTAATCCT AAACTTTTTC 2160 

TACACTATGG CGATTTGACA GATACTTCCA ATCTGACCCG TATTTTAAAA GAAGTTCAAC 222 0 
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CAGATGAAGT TTACAATTTG GGGGCGATGA _ GCCATGTAGC GGTATCATTT G AGTC AC C AG 
AATACACTGC TGATGTTGAT GCGATAGGAA CATTGCGTCT TCTTGAAGCT ATCAGGATAT 
TGGGGCTGGA AAAAAAGACA AAATTTTATC AGGCTTCAAC TTCAGAGCTT TATGGTTTGG 
TTCAAGAAAT TCCACAAAAA GAGACTACGC CATTTTATCC ACGTTCGCCT TATGCTGTTG 
CAAAATTATA TGCCTATTGG ATCACTGTTA ATTATCGTGA GTCTTATGGT ATGTTTGCCT 
GCAATGGTAT TCTCTTTAAC CACGAATCAC CTCGCCGTGG CGAGACCTTT GTTACTCGTA 
AAATAACACG CGGGATAGCA AATATTGCTC AAGGTCTTGA TAAATGCTTA TACTTGGGAA 
ATATGGATTC TCTGCGTGAT TGGGGACATG CTAAGGATTA TGTCAAAATG CAATGGATGA 
TGCTGCAGCA AGAAACTCCA GAAGATTTTG TAATTGCTAC AGGAATTCAA TATTCTGTCC 
GTGAGTTTGT CACAATGGCG GCAGAGCAAG TAGGCATAGA GTTAGCATTT GAAGGTGAGG 
GAGTAAATGA AAAAGGTGTT GTTGTTTCGG TCAATGGCAC TGATGCTAAA GCTGTAAACC 
CGGGCGATGT AATTATATCT GTAGATCCAA GGTATTTTAG GCCTGCAGAA GTTGAAACCT 
TGCTTGGCGA TCCTACTAAT GCG CATAAAA AATTAGGATG GAGCCCTGAA ATTACATTGC 
GTGAAATGGT AAAAGAAATG GTTTCCAGCG ATTTAGCAAT AGCGAAAAAG AACGTCTTGC 
TGAAAGCTAA TAACATTGCC ACTAATATTC CGCAAGAATA AAAAAGATAA TACATTAAAT 
AATTAAAAAT GGTGCTAGAT TTATTAGTAC CATTATTTTT TTTTGGGTGA CTAATGTTTA 
TTACATCAGA TAAATTTAGA G AAATT AT C A AGTTAGTTCC ATTAGTATCA ATTGATCTGC 
TAATTGAAAA CGAGAATGGT GAATATTTAT TTGGTCTTAG GAATAATCGA CCGGCCAAAA 
ATTATTTTTT TGTTCCAGGT GGTAGGATTC GCAAAAATGA ATCTATTAAA AATGCTTTTA 
AAAGAATATC ATCTATGGAA TTAGGTAAAG AGTATGGTAT TTCAGGAAGT GTTTTTAATG 
GTGTATGGGA ACATTTCTAT GATGATGGTT TTTTTTCTGA AGGCGAGGCA ACACATTATA 
TAGTGCTTTG TTACACACTG AAAGTTCTTA AAAGTGAATT GAATCTCCCA GATGATCAAC 
ATCGTGAATA CCTTTGGCTA ACTAAACACC AAATAAATGC TAAACAAGAT GTTCATAACT 
ATTCAAAAAA TTATTTTTTG TAATTTTTAT TAAAAATTAA TATGCGAGAG AATTGTATGT 
CTCAATGTCT TTACCCTGTA ATTATTGCCG GAGGAACCGG AAGCCGTCTA TGGCCGTTGT 
CTCGAGTATT ATACCCTAAA CAATTTTTAA ATTTAGTTGG GGATTCTACA ATGTTGCAAA 
CAACAATTAC GCGTTTGGAT GGCATCGAAT GCGAAAATCC AATTGTTATC TGCAATGAAG 
ATCACCGATT TATTGTAGCA GAGCAATTAC GACAGATTGG TAAGCTAACC AAGAATATTA 
TACTTGAGCC GAAAGGCCGT AATACTGCAC CTGCCATAGC TTTAGCTGCT TTTATCGCTC 
AGAAGAATAA TCCTAATGAC GACCCTTTAT TATTAGTACT TGCGGCAGAC CACTCTATAA 
ATAATGAAAA AGCATTTCGA GAGTCAATAA TAAAAGCTAT GCCGTATGCA ACTTCTGGGA 
AGTTAGTAAC ATTTGGAATT ATTCCGGACA CGGCAAATAC TGGTTATGGA TATATTAAGA 
GAAGTTCTTC AGCTGATCCT AATAAAGAAT TCCCAGCATA TAATGTTGCG GAGTTTGTAG 
AAAAACCAGA TGTTAAAACA GCACAGGAAT ATATTTCGAG TGGGAATTAT TACTGGAATA 
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GCGGAATGTT TTTATTTCGC GCCAGTAAAT -ATCTTGATGA ACTACGGAAA TTTAGACCAG 
ATATTT AT C A TAGCTGTGAA TGTGCAACCG CTACAGCAAA TATAGATATG GACTTTGTCC 
GAATTAACGA GGCTGAGTTT ATTAATTGTC CTGAAGAGTC TATCGATTAT GCTGTGATGG 
AAAAAACAAA AG ACG CTGTA GTTCTTCCGA TAGATATTGG CTGGAATGAC GTGGGTTCTT 
GGTCATCACT TTGGGATATA AGC C AAAAGG ATTGCCATGG TAATGTGTGC CATGGGGATG 
TGCTCAATCA TGATGGAGAA AATAGTTTTA TTTACTCTGA GTCAAGTCTG GTTGCGACAG 
TCGGAGTAAG TAATTTAGTA ATTGTCCAAA CCAAGGATGC TGTACTGGTT GCGGACCGTG 
ATAAAGTCCA AAATGTTAAA AACATAGTTG ACGATCTAAA AAAGAGAAAA CGTGCTGAAT 
ACTACATGCA TCGTGCAGTT TTTCGCCCTT GGGGTAAATT CGATGCAATA GACCAAGGCG 
ATAGATATAG AGTAAAAAAA ATAATAGTTA AACCAGGAGA AGGGTTAGAT TTAAGGATGC 
ATCATCATAG GGCAGAGCAT TGGATTGTTG TATCCGGTAC TGCTAAAGTT TCACTAGGTA 
GTGAAGTTAA ACTATTAGTT TCTAATGAGT CTATATATAT CCCTCAGGGA GCAAAATATA 
GTCTTGAGAA TCCAGGCGTA ATACCTTTGC ATCTAATTGA AGTAAGTTCT GGTGATTACC 
TTGAATCAGA TGATATAGTG CGTTTTACTG ACAGATATAA CAGTAAACAA TTCCTAAAGC 
GAGATTGATA AATATGAATA AAATAACTTG CTTCAAAGCA TATGATATAC GTGGGCGTCT 516 0 

TGGTGCTGAA TTGAATGATG AAATAGCATA TAGAATTGGT CGCGCTTATG GTGAGTTTTT 
TAAACCTCAA ACTGTAGTTG TGGGAGGAGA TGCTCGCTTA ACAAGTGAGA GTTTAAAGAA 
ATCACTCTCA AATGGGCTAT GTGATGCAGG CGTAAATGTC TTAGATCTTG GAATGTGTGG 
TACTGAAGAG ATATATTTTT CCACTTGGTA TTTAGGAATT GATGGTGGAA TCGAGGTAAC 
TGCAAGCCAT AATCCAATTG ATTATAATGG AATGAAATTA GTAACCAAAG GTGCTCGACC 
AATCAGCAGT GACACAGGTC TCAAAGATAT ACAACAATTA GTAGAGAGTA ATAATTTTGA 
AGAGCTCAAC CTAGAAAAAA AAGGGAATAT TACCAAATAT TCCACCCGAG ATGCCTACAT 
AAATCATTTG ATGGGCTATG CTAATCTGCA AAAAATAAAA AAAATCAAAA TAGTTGTGAA 
TTCTGGGAAT GGTGCAGCTG GTCCTGTTAT TGATGCTATT GAGGAATGCT TTTTACGGAA 
CAATATTCCG ATTCAGTTTG TAAAAATAAA TAATACACCC GATGGTAATT TTCCACATGG 
TATCCCTAAT CCATTACTAC CTGAGTGCAG AGAAGATACC AGCAGTGCGG TTATAAGACA 
TAGTGCTGAT TTTGGTATTG CATTTGATGG TGATTTTGAT AGGTGTTTTT TCTTTGATGA 
AAATGGACAA TTTATTGAAG GATACTACAT TGTTGGTTTA TTAGCGGAAG TTTTTTTAGG 
GAAATATCCA AACGCAAAAA TCATTCATGA TCCTCGCCTT ATATGGAATA CTATTGATAT 
CGTAGAAAGT CATGGTGGTA TACCTATAAT GACTAAAACC GGTCATGCTT ACATTAAGCA 
AAGAATGCGT GAAGAGGATG CCGTATATGG CGGCGAAATG AGTGCGCATC ATTATTTTAA 
AGATTTTGCA TACTGCGATA GTGGAATGAT TCCTTGGATT TTAATTTGTG AACTTTTGAG 6180 
TCTGACAAAT AAAAAATTAG GTGAACTGGT TTGTGGTTGT ATAAACGACT GGCCGGCAAG 
TGGAGAAATA AACTGTACAC TAGACAATCC GCAAAATGAA ATAGATAAAT TATTTAATCG 
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TTACAAAGAT AGTGCCTTAG CTGTTGATTA CACTGATGGA TTAACTATGG AGTTCTCTGA 
TTGGCGTTTT AATGTTAGAT GCTCAAATAC AGAACCTGTA GTACGATTGA ATGT AG AAT C 
TAGGAATAAT G CTATTCTT A TGCAGGAAAA AACAGAAGAA ATTCTGAATT TTATATCAAA 
ATAAATTTGC ACCTGAGTTC ATAATGGGAA CAAGAAATAT ATGAAAGTAC TTCTGACTGG 
CTCAACTGGC ATGGTTGGTA AGAATATATT AGAGCATGAT AGTGCAAGTA AATATAATAT 
ACTTACTCCA ACCAGCTCTG ATTTGAATTT ATTAGATAAA AATGAAATAG AAAAATTCAT 
GCTTATCAAC ATGCCAGACT GTATTATACA TGCAGCGGGA TTAGTTGGAG GCATTCATGC 
AAATATAAGC AGGCCGTTTG ATTTTCTGGA AAAAAATTTG CAGATGGGTT TAAATTTAGT 
TTCCGTCGCA AAAAAACTAG GTATCAAGAA AGTGCTTAAC TTGGGTAGTT CATGCATGTA 
CCCCAAAAAC TTTGAAGAGG CTATTCCTGA GAAAGCTCTG TTAACTGGTG AGCTAGAAGA 
AACTAATGAG GGATATGCTA TTGCGAAAAT TGCTGTAGCA AAAGCATGCG AATATATATC 
AAGAGAAAAC TCTAATTATT TTTATAAAAC AATTATCCCA TGTAATTTAT ATGGGAAATA 
TGATAAATTT GATGATAACT CGTCACATAT GATTCCGGCA GTTATAAAAA AAATCCATCA 
TGCGAAAATT AATAATGTCC C AGAGAT CG A AATTTGGGGG GATGGTAATT CGCGCCGTGA 
GTTTATGTAT GCAGAAGATT TAGCTGATCT TATTTTTTAT GTTATTCCTA AAATAGAATT 
CATGCCTAAT ATGGTAAATG CTGGTTTAGG TTACGATTAT TCAATTAATG ACTATTATAA 
GATAATTGCA GAAGAAATTG GTTATACTGG GAGTTTTTCT CATGATTTAA CAAAACCAAC 
AGGAATGAAA CGGAAGCTAG TAGATATTTC ATTGCTTAAT AAAATTGGTT GGTCAAGTCA 
CTTTGAACTC AGAGATGGCA TCAGAAAGAC CTATAATTAT TACTTGGAGA ATCAAAATAA 
ATGATTACAT ACCCACTTGC TAGTAATACT TGGGATGAAT ATGAGTATG C AGCAATACAG 
TCAGTAATTG ACTCAAAAAT GTTTAC CATG GGTAAAAAGG TTGAGTTATA TGAGAAAAAT 
TTTGCTGATT TGTTTGGTAG CAAATATGCC GTAATGGTTA GCTCTGGTTC TACAGCTAAT 
CTGTTAATGA TTGCTGCCCT TTTCTTCACT AATAAACCAA AACTTAAAAG AGGTGATGAA 
ATAATAGTAC CTGCAGTGTC ATGGTCTACG ACATATTACC CTCTGCAACA GTATGGCTTA 
AAGGTGAAGT TTGTCGATAT CAATAAAGAA ACTTTAAATA TTGATATCGA TAGTTTGAAA 
AATGCTATTT CAGATAAAAC AAAAGCAATA TTGACAGTAA ATTTATTAGG TAATCCTAAT 
GATTTTGCAA AAATAAATGA GATAATAAAT AATAGGGATA TTATCTTACT AGAAGATAAC 
TGTGAGTCGA TGGGCGCGGT CTTTCAAAAT AAGCAGGCAG GCACATTCGG AGTTATGGGT 
ACCTTTAGTT CTTTTTACTC TCATCATATA GCTACAATGG AAGGGGGCTG CGTAGTTACT 
GATGATGAAG AGCTGTATCA TGTATTGTTG TGCCTTCGAG CTCATGGTTG GACAAGAAAT 
TTACCAAAAG AGAATATGGT TACAGGCACT AAGAGTGATG ATATTTTCGA AGAGTCGTTT 
AAGTTTGTTT TACCAGGATA CAATGTTCGC CCACTTGAAA TGAGTGGTGC TATTGGGATA 
GAGCAACTTA AAAAGTTACC AGGTTTTATA TCCACCAGAC GTTCCAATGC ACAATATTTT 
GTAGATAAAT TTAAAGATCA TCCATTCCTT GATATACAAA AAGAAGTTGG TGAAAGTAGC 
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TGGTTTGGTT TTTCC TTCGT TATAAAGGAG GGAGCTGCTA TTGAGAGGAA GAGTTTAGTA 
AATAATCTGA TCTCAGCAGG CATTGAATGC CGACCAATTG TTACTGGGAA TTTTCTCAAA 
AATGAACGTG TTTTGAGTTA TTTTGATTAC TCTGTACATG ATACGGTAGC AAATGCCGAA 
TATATAGATA AGAATGGTTT TTTTGTCGGA AACCACCAGA TACCTTTGTT TAATGAAATA 
GATTATCTAC GAAAAGTATT AAAATAACTA ACGAGGCACT CTATTTCGAA TAG AGTG CCT 
TTAAGATGGT ATTAACAGTG AAAAAAATTT TAGCGTTTGG CTATTCTAAA GTACTACCAC 
CGGTTATTGA ACAGTTTGTC AATCCAATTT GCATCTTCAT TATCACACCA CTAATACTCA 
ACCACCTGGG TAAGCAAAGC TATGGTAATT GGATTTTATT AATTACTATT GTATCTTTTT 
CTCAGTTAAT ATGTGGAGGA TGTTCCGCAT GGATTGCAAA AATCATTGCA GAACAGAGAA 
TTCTTAGTGA TTTATCAAAA AAAAATGCTT TACGTCAAAT TTCCTATAAT TTTTCAATTG 
TTATTATCGC ATTTGCGGTA TTGATTTCTT TTCTTATATT AAGTATTTGT TTCTTCGATG 
TTGCGAGGAA TAATTCTTCA TTCTTATTCG CGATTATTAT TTGTGGTTTT TTTCAGGAAG 
TTGATAATTT ATTTAGTGGT GCGCTAAAAG GTTTTGAAAA ATTTAATGTA TCATGTTTTT 
TTGAAGTAAT TACAAGAGTG CTCTGGGCTT CTATAGTAAT ATATGGCATT TACGGAAATG 
CACTCTTATA TTTTACATGT TTAGCCTTTA CCATTAAAGG TATGCTAAAA TATATTCTTG 
TATGTCTGAA TATTACCGGT TGTTTCATCA ATCCTAATTT TAATAGAGTT GGGATTGTTA 
ATTTGTTAAA TGAGTCAAAA TGGATGTTTC TTCAATTAAC TGGTGGCGTC TCACTTAGTT 
TGTTTGATAG GCTCGTAATA CCATTGATTT TATCTGTCAG TAAACTGGCT TCTTATGTCC 
CTTGCCTTCA ACTAGCTCAA TTGATGTTCA CTCTTTCTGC GTCTGCAAAT CAAATATTAC 
TACCAATGTT TGCTAGAATG AAAGCATCTA ACACATTTCC CTCTAATTGT TTTTTTAAAA 
TTCTGCTTGT ATC AC TAATT TCTGTTTTGC CTTGTCTTGC GTTATTCTTT TTTGGTCGTG 
ATATATTATC AATATGGATA AACCCTACAT TTGCAACTGA AAATTATAAA TTAATGCAAA 
TTTTAGCTAT AAGTTACATT TTATTGTCAA TGATGACATC TTTTCATTTC TTGTTATTAG 
GAATTGGTAA ATCTAAGCTT GTTGCAAATT TAAATCTGGT TGCAGGGCTC GCACTTGCTG 
CTTCAACGTT AATCGCAGCT CATTATGGCC TTTATGCAAT ATCTATGGTA AAAATAATAT 
ATCCGGCTTT TCAATTTTAT TACCTTTATG TAGCTTTTGT CTATTTTAAT AGAGCGAAAA 
ATGTCTATTG ATTTACTTTT TTCAATTACT GAAATCGCAA TTGTTTTTTC TTGCACTATT 
TACATATTTA CTCAATGTTT GTTAATGCGG AGGATCTATT TAGATAAAAG TATTTTAATT 
CTTTTATGCT TGCTCTTTTT TTTAGTAATC ATTCAACTTC CTGAGCTTAA TGTAAACGGT 
TTGGTCGATT CTTTAAAGTT ATCACTGCCT TTATTGATGG TCTTTATCGC TTTTCAAAAA 
CCGAAATTAT GCTTGTGGGT TATTATTGCA TTGTTGTTTT TGAACTCTGC ATTTAATTTT 
TTATATTTAA AGACATTCGA TAAGTTTAGC TCATTTCCTT TTACTTTTTT TATATTGCTG 
TTTTACTTGT TTAGATTGGG AATTGGTAAT TTACCGGTTT ATAAAAATAA AAAATTTTAC 
GCGTTGATTT TTCTCTTTAT ATTAATAGAC ATAATGCAGT CATTGTTAAT AAATTATAGG 
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GGGCAGATTT TATATTCCGT AATTTGCATC CTGATACTTG TGTTTAAAGT TAATTTAAGA 10440 
AAAAAGATTC CATACTTTTT TTTAATGCTG CCAGTTTTAT ATGTAATTAT TATGGCTTAT 10500 
ATTGGTTTTA ATTATTTCAA TAAAGGCGTA ACTTTTTTTG AACCTACAGC AAGTAATATT 1056 0 
GAACGTACGG GGATGATATA TTATTTGGTT TCACAGCTTG GTGATTATAT ATTCCATGGT 10620 
ATGGGGACAT TAAATTTCTT AAATAACGGC GGACAATATA AGACGTTATA TGGACTTCCA 10680 
TCATTAATTC CTAATGACCC TCATGATTTT TTATTACGGT TCTTTATAAG TATTGGTGTG 10740 
ATAGGAGCAT TGGTTTATCA TTCTATATTT TTTGTTTTTT TTAGGAGAAT ATCTTTCTTA 10800 
TTATATGAGA GAAATGCTCC TTTCATTGTT GTAAGTTGTT TGTTACTGTT ACAAGTTGTG 
TTAATTTATA CATTAAACCC TTTTGATGCT TTTAATCGAT TGATTTGCGG GCTTACAGTT 
GGAGTTGTTT ATGGATTTGC AAAAATTAGA TAAGTATACC TGTAATGGAA ATTTAGACGC 
TCCACTTGTT TCAATAATCA TTGCAACTTA TAATTCTGAA CTTGATATAG CTAAGTGTTT 
GCAATCGGTA ACTAATCAAT CTTATAAGAA TATTGAAATC ATAATAATGG ATGGAGGATC 
TTCTGATAAA ACGCTTGATA TTGCAAAATC GTTTAAAGAC GACCGAATAA AAATAGTTTC 
AGAGAAAGAT CGTGGAATTT ATGATGCCTG GAATAAAGCA GTTGATTTAT CCATTGGTGA 
TTGGGTAGCA TTTATTGGTT CAGATGATGT TTACTATCAT ACAGATGCAA TTGCTTCATT 
GATGAAGGGG GTTATGGTAT CTAATGGCGC CCCTGTGGTT TATGGGAGGA CAGCGCACGA 
AGGTCCCGAT AGGAACATAT CTGGATTTTC AGGCAGTGAA TGGTACAACC TAACAGGATT 
TAAGTTTAAT TATTACAAAT GTAATTTACC ATTGC CCATT ATGAGCGCAA TATATTCTCG 
TGATTTCTTC AGAAACGAAC GTTTTGATAT TAAATTAAAA ATTGTTGCTG ACGCTGATTG 
GTTTCTGAGA TGTTTCATCA AATGGAGTAA AGAGAAGTCA CCTTATTTTA TTAATGACAC 
GACCCCTATT GTTAGAATGG GATATGGTGG GGTTTCGACT GATATTTCTT CTCAAGTTAA 
AACTACGCTA GAAAGTTTCA TTGTACGCAA AAAGAATAAT ATATCCTGTT TAAACATACA 
GCTGATTCTT AGATATGCTA AAATTCTGGT GATGGTAGCG ATCAAAAATA TTTTTGGCAA 
TAATGTTTAT AAATTAATGC ATAACGGGTA TCATTCCCTA AAGAAAATCA AGAATAAAAT 
ATGAAGATTG TTTATATAAT AACCGGGCTT ACTTGTGGTG GAGCCGAACA CCTTATGACG 
CAGTTAGCAG AC CAAATGTT TATACGCGGG CATGATGTTA ATATTATTTG TCTAACTGGT 
ATATCTGAGG TAAAGCCAAC ACAAAATATT AATATTCATT ATGTTAATAT GGATAAAAAT 
TTTAGAAGCT TTTTTAGAGC TTTATTTCAA GTAAAAAAAA TAATTGTCGC CTTAAAGCCA 
GATATAATAC ATAGTCATAT GTTTCATGCT AATATTTTTA GTCGTTTTAT TAGGATGCTG 12120 
ATTCCAGCGG TGCCCCTGAT ATGTACCGCA CACAACAAAA ATGAAGGTGG CAATGCAAGG 12180 
ATGTTTTGTT ATCGACTGAG TGATTTTTTA GCTTCTATTA CTACAAATGT AAGTAAAGAG 12240 
GCTGTTCAAG AGTTTATAGC AAGAAAGGCT ACACCTAAAA ATAAAATAGT AGAGATTCCG 12300 
AATTTTATTA ATACAAATAA ATTTGATTTT GATATTAATG TCAGAAAGAA AACGCGAGAT 12360 
GCTTTTAATT TGAAAGACAG TACAGCAGTA CTGCTCGCAG TAGGAAGACT TGTTGAAGCA 12420 
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AAAGACTATC CGAACTTATT AAATGCAATA .AATCATTTGA TTCTTTCAAA AACATCAAAT 12480 
TGTAATGATT TTATTTTGCT TATTGCTGGC GATGGCGCAT TAAGAAATAA ATTATTGGAT 12540 
TTGGTTTGTC AATTGAATCT TGTGGATAAA GTTTTCTTCT TGGGGCAAAG AAGTGATATT 12600 
AAAGAATTAA TGTGTGCTGC AGATCTTTTT GTTTTGAGTT CTGAGTGGGA AGGTTTTGGT 12660 
CTCGTTGTTG CAGAAGCTAT GGCGTGTGAA CGTCCCGTTG TTGCTACCGA TTCTGGTGGA 
GTTAAAGAAG TCGTTGGACC TCATAATGAT GTTATCCCTG TCAGTAATCA T ATT CTGTTG 
GCAGAGAAAA TCGCTGAGAC ACTTAAAATA GATGATAACG CAAGAAAAAT AATAGGTATG 1284 0 
AAAAATAGAG AATATATTGT TTCCAATTTT TCAATTAAAA CGATAGTGAG TGAGTGGGAG 12900 
CGCTTATATT TTAAATATTC CAAGCGTAAT AATATAATTG ATTGAAAATA TAAGTTTGTA 12960 
CTCTGGATGC AATAGTTTCT CTATGCTGTT TTTTTACTGG CTCCGTATTT TTACTTATAG 13020 
CTGGATTTTG TTATATATCA GTATTAATCT GTCTCAACTT CATCTAGACT ACATTCAAGC 13080 
CGCGCATGCG TCGCGCGGTG ACT AC AC CTG ACAGGAGTAT GTAATGTCCA AGCAACAGAT 13140 
CGGCGTCGTC GGTATGGCAG TGATGGGGCG CAACCTGGCG CTCAACATCG AAAGCCGCGG 13200 
TTATACCGTC TCCATCTTCA ACCGCTCCCG CGAGAAAACT GAAGAAGTTG TTGCCGAGAA 13260 
CCCGGATAAG AAACTGGTTC CTTATTACAC GGTGAAAGAG TTCGTCGAGT CTCTTGAAAC 13320 
CCCACGTCGT ATC CTGTTAA TGGT AAAAG C AGGGGCGGGA ACTGATGCTG CTATCGATTC 13380 
CCTGAAGCCG TATCTGGATA AAGGCGACAT CATTATTGAT GGTGGCAACA CCTTCTTCCA 13440 
GGACACTATC CGTCGTAACC GTGAACTGTC CGCGGAAGGC TTTAACTTCA TCGGTACCGG 13500 
CGTGTCCGGC GGTGAAGAGG GCGCCCTGAA AGGCCCATCT ATCATGCCAG GTGGCCAGAA 13560 
AGAAGCGTAT GAGCTGGTTG CGCCTATCCT GACCAAGATT GCTGCGGTTG CTGAAGATGG 13620 
CGAACCATGT ATAACTTACA TCGGTG CTG A CGGTGCGGGT CACTACGTGA AGATGGTGCA 13680 
CAACGGTATC GAATATGGCG ATATGCAGCT GATTGCTGAA GCCTATTCTC TGCTTAAAGG 13740 
CGGCCTTAAT CTGTCTAACG AAGAGCTGGC AACCACTTTT AC CG AGTGGA ATGAAGGCGA 13800 
GCTAAGTAGC TACCTGATTG ACATCACCAA AGACATCTTC ACCAAAAAAG ATGAAGAGGG 13860 
TAAATACCTG GTTGATGTGA TCCTGGACGA AGCTGCGAAC AAAGGCACCG GTAAATGGAC 13920 
CAGCCAGAGC TCTCTGGATC TGGGTGAACC GCTGTCGCTG ATCACCGAAT CCGTATTCGC 13980 
TCGCTACATC TCTTCTCTGA AAGACCAGCG CATTGCGGCA TCTAAAGTGC TGTCTGGTCC 14040 
GCAGGCTAAA CTGGCTGGTG ATAAAGCAGA GTTCGTTGAG AAAGTCCGTC GCGCG CTGT A 14100 
CCTGGGTAAA ATCGTCTCTT ATGCCCAAGG CTTCTCTCAA CTGCGTGCCG CGTCTGACGA 14160 
ATACAACTGG GATCTGAACT ACGGCGAAAT CGCGAAGATC TTCCGCGCGG GCTGCATCAT 14220 
TCGTGCGCAG TTCCTGCAGA AAATTACTGA CGCGTATGCT GAAAACAAAG GCATTGCTAA 14280 
CCTGTTGCTG GCTC CGTACT TCAAAAATAT CGCTGATGAA TATCAGCAAG CGCTGCGTGA 14340 
TGTAGTGGCT TATGCTGTGC AGAACGGTAT TCCGGTACCG ACCTTCTCTG CAGCGGTAGC 14400 
CTACTACGAC AGCTACCGTT CTGCGGTACT GCCGGCTAAT CTGATTCAGG CACAGCGTGA 14460 
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TTACTT.CGGT GCGCACACGT ATAAACGCAC . TGATAAAGAA GGTGTGTTCC ACACCG 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14024 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE : YES 
(v) ORIGINAL SOURCE 

(A) ORGANISM: Escherichia coli 
(vi) Note that the first I9bp is from the primer used for the long PGR 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
GTAACCAAGG GCGGTACGTG CATAAATTTT AATGCTTATC AAAACTATTA GCATTAAAAA 
TATATAAGAA ATTCTCAAAT GAACAAAGAA ACCGTTTCAA TAATTATGCC CGTTTACAAT 
GGGGCCAAAA CTATAATCTC ATCAGTAGAA TCAATTATAC ATCAATCTTA TCAAGATTTT 
GTTTTGTATA TCATTGACGA TTGTAGCACC GATGATACAT TTTCATTAAT CAACAGTCGA 
TACAAAAACA ATCAGAAAAT AAGAATATTG CGTAACAAGA CAAATTTAGG TGTTGCAGAA 
AGTCGAAATT ATGGAATAGA AATGGCCACG GGGAAATATA TTTCTTTTTG TGATGCGGAT 
GATTTGTGGC ACGAGAAAAA ATTAGAGCGT CAAATCGAAG TGTTAAATAA TGAATGTGTA 
GATGTGGTAT GTTCTAATTA TTATGTTATA GATAACAATA GAAATATTGT TGGCGAAGTT 
AATGCTCCTC ATGTGATAAA TTATAGAAAA ATGCTCATGA AAAACTACAT AGGGAATTTG 
ACAGGAATCT ATAATGCCAA CAAATTGGGT AAGTTTTATC AAAAAAAGAT TGGTCACGAG 
GATTATTTGA TGTGGCTGGA AATAATTAAT AAAACAAATG GTGCTATTTG TATTCAAGAT 
AATCTGGCGT ATTACATGCG TTCAAATAAT TC ACT AT CGG GTAATAAAAT TAAAGCTGCA 
AAATGGACAT GGAGTATATA TAGAGAACAT TTACATTTGT CCTTTCCAAA AACATTATAT 
TATTTTTTAT TATATGCTTC AAATGGAGTC ATGAAAAAAA TAACACATTC ACTATTAAGG 
AGAAAGGAGA CTAAAAAGTG AAGTCAGCGG CTAAGTTGAT TTTTTTATTC CTATTTACAC 
TTTATAGTCT CCAGTTGTAT GGGGTTATCA TAGATGATCG TATAACAAAT TTTGATACAA 
AGGTATTAAC TAGTATTATA ATTATATTTC AGATTTTTTT TGTTTTATTA TTTTATCTAA 
CGATTATAAA TGAAAGAAAA CAGCAGAAAA AATTTATCGT GAACTGGGAG CTAAAGTTAA 
TACTCGTTTT CCTTTTTGTG ACTATAGAAA TTGCTGCTGT AGTTTTATTT CTTAAAGAAG 
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GTATTCCTAT ATTJGATGAT GATCCAGGGG_ GGGCTAAACT TAGAATAGCT GAAGGTAATG 
GACTTTACAT TAGATATATT AAGTATTTTG GTAATATAGT TGTGTTTGCA TTAATTATTC 
TTTATGATGA GCATAAATTC AAACAGAGGA CCATCATATT TGTATATTTT ACAACGATTG 
CTTTATTTGG TTATCGTTCT GAATTGGTGT TGCTCATTCT TCAATATATA TTGATT AC C A 
ATATCCTGTC AAAGGATAAC CGTAATCCTA AAATAAAAAG AATAATAGGG TATTTTTTAT 
TGGTAGGGGT TGTATGCTCG TTGTTTTATC TAAGTTTAGG ACAAGACGGA GAACAAAATG 
ACTCATATAA TAATATGTTA AGGATAATTA ATAGGTTAAC AAT AGAG CAA GTTGAAGGTG 
TTCCATATGT TGTTTCTGAA TCTATTAAGA ACGATTTCTT TCCGACACCA GAGTTAGAAA 
AGGAATTAAA AGCAATAATA AATAGAATAC AGGGAATAAA GCATCAAGAC TTATTTTATG 
GAGAACGGTT ACATAAACAA GTATTTGGAG ACATGGGAGC AAATTTTTTA TCAGTTACTA 
CGTATGGAGC AGAACTGTTA GTTTTTTTTG GTTTTCTCTG TGTATTCATT ATCCCTTTAG 
GGATATATAT ACCTTTTTAT CTTTTAAAGA GAATGAAAAA AACCCATAGC TCGATAAATT 
GCGCATTCTA TTCATATATC ATTATGATTT TATTGCAATA CTTAGTGGCT GGGAATGCAT 
CGGCCTTCTT TTTTGGTCCT TTTCTCTCCG TATTGATAAT GTGTACTCCT CTGATCTTAT 
TGCATGATAC GTTAAAGAGA TTATCACGAA ATGAAAATAT CAGTTATAAC TGTGACTTAT 
AATAATGCTG AAGGGTTAGA AAAAACTTTA AGTAGTTTAT CAATTTTAAA AATAAAACCT 
TTTGAGATTA TTATAGTTGA TGGCGGCTCT ACAGATGGAA CGAATCGTGT CATTAGTAGA 
TTTACTAGTA TGAATATTAC ACATGTTTAT GAAAAAGATG AAGGGATATA TGATGCGATG 
AATAAGGGCC GAATGTTGGC CAAAGGCGAC TTAATACATT ATTTAAACGC CGGCGATAGC 
GTAATTGGAG ATATATATAA AAATATCAAA GAGCCATGTT TGATTAAAGT TGGCCTTTTC 
GAAAATGATA AACTTCTGGG ATTTT CTTCT ATAACCCATT CAAATACAGG GTATTGTCAT 
CAAGGGGTGA TTTTCCCAAA GAATCATTCA GAATATGATC TAAGGTATAA AATATGTGCT 
GATTATAAGC TTATTCAAGA GGTGTTTCCT GAAGGGTTAA GATCTCTATC TTTGATTACT 
TCGGGTTATG TAAAATATGA TATGGGGGGA GTATCTTCAA AAAAAAGAAT TTTAAGAGAT 
AAAGAGCTTG CCAAAATTAT GTTTGAAAAA AATAAAAAAA ACCTTATTAA GTTTATTCCA 
ATTTCAATAA T C AAAATTTT ATTCCCTGAA CGTTTAAGAA GAGTATTGCG GAAAATGCAA 
TATATTTGTC TAACTTTATT CTTCATGAAG AATAGTTCAC CATATGATAA TGAATAAAAT 
CAAAAAAATA CTTAAATTTT GCACTTTAAA AAAATATGAT ACATCAAGTG CTTTAGGTAG 
AGAACAGGAA AGGTACAGGA TTATATCCTT GTCTGTTATT TCAAGTTTGA TTAGTAAAAT 
ACTCTCACTA CTTTCTCTTA TATTAACTGT AAGTTTAACT TTACCTTATT TAGGACAAGA 
GAGATTTGGT GTATGGATGA CTATTACCAG TCTTGGTGCT GCTCTGACAT TTTTGGACTT 
AGGTATAGGA AATGCATTAA CAAACAGGAT CGCACATTCA TTTGCGTGTG GCAAAAATTT 
AAAGATGAGT CGGCAAATTA GTGGTGGGCT CACTTTGCTG GCTGGATTAT CGTTTGTCAT 
AACTGCAATA TGCTATATTA CTTCTGGCAT GATTGATTGG CAACTAGTAA TAAAAGGTAT 



1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 



PCT/AU98/00315 
WO 98/50531 



71 - 

3240 
3300 
3360 



AAACGAGAAT GTGT-ATGCAG AGTTACAACA -CTCAATTAAA GTCTTTGTAA TCATATTTGG 
ACTTGGAATT TATT C AAATG GTGTGCAAAA AGTTTATATG GGAATACAAA AAGCCTATAT 
AAGTAATATT GTTAATGCCA TATTTATATT GTTATCTATT ATTACTCTAG TAATATCGTC 
GAAACTACAT GCGGGACTAC CAGTTTTAAT TGTCAGCACT CTTGGTATTC AATACATATC 
GGGAATCTAT TTAACAATTA ATCTTATTAT AAAGCGATTA ATAAAGTTTA CAAAAGTTAA 
CATACATGCT AAAAGAGAAG CTCCATATTT GATATTAAAC GGTTTTTTCT TTTTTATTTT 
ACAGTTAGGC ACTCTGGCAA CATGGAGTGG TGATAACTTT ATAATATCTA TAACATTGGG 
TGTTACTTAT GTTGCTGTTT TTAGCATTAC ACAGAGATTA TTTCAAATAT CTACGGTCCC 
TCTTACGATT TATAAC ATC C CGTTATGGGC TGCTTATGCA GATGCTCATG CACGCAATGA 
TACTCAATTT ATAAAAAAGA CGCTCAGAAC AT C ATTG AAA ATAGTGGGTA TTTCATCATT 
CTTATTGGCC TTCATATTAG TAGTGTTCGG TAGTGAAGTC GTTAATATTT GGACAGAAGG 
AAAGATTCAG GTACCTCGAA CATTCATAAT AGCTTATGCT TTATGGTCTG TTATTGATGC 
TTTTTCGAAT ACATTTGCAA GCTTTTTAAA TGGTTTGAAC ATAGTTAAAC AACAAATGCT 
TGCTGTTGTA ACATTGATAT TGATCGCAAT TCCAGCAAAA TACATCATAG TTAGCCATTT 
TGGGTTAACT GTTATGTTGT ACTGCTTCAT TTTTATATAT ATTGTAAATT AGTTTATATG 
GTATAAATGT AGTTTTAAAA AACATATCGA TAGACAGTTA AATATAAGAG GATGAAAATG 
AAATATATAC CAGTTTACCA ACCGTCATTG ACAGGAAAAG AAAAAGAATA TGTAAATGAA 
TGTCTGGACT CAACGTGGAT TTCATCAAAA GGAAACTATA TTCAGAAGTT TGAAAATAAA 
TTTGCGGAAC AAAAC CATGT GCAATATGCA ACTACTGTAA GTAATGGAAC GGTTGCTCTT 
CATTTAGCTT TGTTAGCGTT AGGTATATCG GAAGGAGATG AAGTTATTGT TCCAACACTG 
ACATATATAG CATCAGTTAA TGCTATAAAA TACACAGGAG CCACCCCCAT TTTCGTTGAT 
TCAGATAATG AAACTTGGCA AATGTC TGTT AGTGACATAG AACAAAAAAT CACTAATAAA 
ACTAAAGCTA TTATGTGTGT CCATTTATAC GGACATCCAT GTGATATGGA ACAAATTGTA 
GAACTGGCCA AAAGTAGAAA TTTGTTTGTA ATTGAAGATT GCGCTGAAGC CTTTGGTTCT 
AAATATAAAG GTAAATATGT GGGAACATTT GGAGATATTT CTACTTTTAG CTTTTTTGGA 
AATAAAACTA TTACTACAGG TGAAGGTGGA ATGGTTGTCA CGAATGACAA AACACTTTAT 
GACCGTTGTT TACATTTTAA AGGCCAAGGA TTAGCTGTAC ATAGGCAATA TTGGCATGAC 
GTTATAGGCT ACAATTATAG GATGACAAAT ATCTGCGCTG CTATAGGATT AGCCCAGTTA 
GAACAAGCTG ATGATTTTAT ATCACGAAAA CGTGAAATTG CTGATATTTA TAAAAAAAAT 
ATCAACAGTC TTGTACAAGT CCACAAGGAA AGTAAAGATG TTTTTCACAC TTATTGGATG 
GTCTCAATTC TAACTAGGAC CGCAGAGGAA AGAGAGGAAT TAAGGAATCA CCTTGCAGAT 
AAACTCATCG AAACAAGGCC AGTTTTTTAC CCTGTCCACA CGATGCCAAT GTACTCGGAA 
AAATATCAAA AGCACCCTAT AGCTGAGGAT CTTGGTTGGC GTGGAATTAA TTTACCTAGT 
TTCCCCAGCC TAT CGAATG A GCAAGTTATT TATATTTGTG AATCTATTAA CGAATTTTAT 
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AGTGATAAAT AGCCTAAAAT ATTGTAAAGG -TCATTCATGA AAATTGCGTT GAATTCAGAT 
GGATTTTACG AGTGGGGCGG TGGAATTGAT TTTATTAAAT ATATTCTGTC AATATTAGAA 
ACGAAACCAG AAATATGTAT CGATATTCTT TTACCGAGAA ATGATATACA TTCTCTTATA 
AGAGAAAAAG CATTTCCTTT TAAAAGTATA TTAAAAGCAA TTTTAAAGAG GGAAAGGCCT 
CGATGGATTT CATTAAATAG ATTTAATGAG CAATACTATA GAGATGCCTT TACACAAAAT 
AATATAGAGA CGAATCTTAC CTTTATTAAA AGTAAGAGCT CTGCCTTTTA TTCATATTTT 
GATAGTAGCG ATTGTGATGT TATTCTTCCT TGCATGCGTG TTCCTTCGGG AAATTTGAAT 
AAAAAAGCAT GGATTGGTTA TATTTATGAC TTTCAACACT GTTACTATCC TTCATTTTTT 
AGTAAGCGAG AAATAGATCA AAGGAATGTG TTTTTTAAAT TGATGCTCAA TTGCGCTAAC 
AATATTATTG TTAATGCACA TTCAGTTATT ACCGATGCAA ATAAATATGT TGGGAATTAT 
TCTGCAAAAC TACATTCTCT TCCATTTAGT CCATGCCCTC AATTAAAATG GTTCGCTGAT 
TACT CTGGT A ATATTGCCAA ATATAATATT GACAAGGATT ATTTTATAAT TTGCAATCAA 
TTTTGGAAAC ATAAAGATCA TGCAACTGCT TTTAGGGCAT TTAAAATTTA TACTGAATAT 
AATCCTGATG TTTATTTAGT ATGCACGGGA GCTACTCAAG ATTATCGATT CCCTGGATAT 
TTTAATGAAT TGATGGTTTT GGCAAAAAAG CTCGGAATTG AATCGAAAAT TAAGATATTA 
GGGCATATAC CTAAACTTGA ACAAATTGAA TTAAT C AAAA ATTGCATTGC TGTAATACAA 
CCAACCTTAT TTGAAGGCGG GCCTGGAGGG GGGGTAACAT TTGACGCTAT TGCATTAGGG 
AAAAAAGTTA TACTATCTGA CATAGATGTC AATAAAGAAG TTAATTGCGG TGATGTATAT 
TTCTTTCAGG CAAAAAACCA TTATTCATTA AATGACGCGA TGGTAAAAG C TGATGAATCT 
AAAATTTTTT ATGAACCTAC AACTCTGATA GAATTGGGTC TCAAAAGACG CAATGCGTGT 
GCAGATTTTC TTTTAGATGT TGTGAAACAA GAAATTGAAT CCCGATCTTA ATATATTCAA 
GAGGTATATA ATGACTAAAG TCGCTCTTAT TACAGGTGTA ACTGGACAAG ATGGATCTTA 
TCTAGCTGAG TTTTTGCTTG ATAAAGGGTA TGAAGTTCAT GGTATCAAAC GCCGAGCCTC 
ATCTTTTAAT ACAGAACGCA TAGACCATAT TTATCAAGAT CCACATGGTT CTAACCCAAA 
TTTTCACTTG CACTATGGAG ATCTGACTGA TTCATCTAAC CTCACTAGAA TTCTAAAGGA 
GGTACAGCCA GATGAAGTAT ATAATTTAGC TGCTATGAGT CACGTAGCAG TTTCTTTTGA 
GTCTCCAGAA TATACAGCCG ATGTCGATGC AATTGGTACA TTACGTTTAC TGGAAGCAAT 
TCGCTTTTTA GGATTGGAAA ACAAAACGCG TTTCTATCAA GCTTCAACCT CAGAATTATA 
TGGACTTGTT CAGGAAATCC CTCAAAAAGA ATCCACCCCT TTTTATCCTC GTTCCCCTTA 
TGCAGTTGCA AAACTTTACG CATATTGGAT CACGGTAAAT TATCGAGAGT CATATGGTAT 
TTATGCATGT AATGGTATAT TGTTCAATCA TGAATCTCCA CGCCGTGGAG AAACGTTTGT 
AACAAGGAAA ATTACTCGAG GACTTGCAAA TATTGCACAA GGCTTGGAAT CATGTTTGTA 
TTTAGGGAAT ATGGATTCGT TACGAGATTG GGGACATGCA AAAGATTATG TTAGAATGCA 
ATGGTTGATG TTACAACAGG AGCAACCCGA AGATTTTGTG ATTGCAACAG GAGTCCAATA 
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CTCAGTCCGT CAGTTTGTCG AAATGGCAGC. AGCACAACTT GGTATTAAGA TGAGCTTTGT 
TGGTAAAGGA ATCGAAGAAA AAGGCATTGT AGATTCGGTT GAAGGACAGG ATGCTCCAGG 
TGTGAAACCA GGTGATGTCA TTGTTGCTGT TGATCCTCGT TATTTCCGAC CAGCTGAAGT 
TGATACTTTG CTTGGAGATC CGAGC AAAG C TAATCTCAAA CTTGGTTGGA GACCAGAAAT 
TACTCTTGCT GAAATGATTT CTGAAATGGT TG C C AAAG AT CTTGAAGCCG CTAAAAAACA 
TTCTCTTTTA AAATCGCATG GTTTTTCTGT AAGCTTAGCT CTGGAATGAT GATGAATAAG 
CAACGTATTT TTATTGCTGG TCACCAAGGA ATGGTTGGAT CAGCTATTAC CCGACGCCTC 
AAACAACGTG ATGATGTTGA GTTGGTTTTA CGTACTCGGG ATGAATTGAA CTTGTTGGAT 
AGTAGCGCTG TTTTGGATTT TTTTTCTTCA CAGAAAATCG ACCAGGTTTA TTTGGCAGCA 
GCAAAAGTCG GAGGTATTTT AGCTAACAGT TCTTATCCTG CCGATTTTAT ATATGAGAAT 
ATAATGATAG AGGCGAATGT CATTCATGCT GCCCACAAAA ATAATGTAAA TAAACTGCTT 
TTCCTCGGTT CGTCGTGTAT TTATCCTAAG TTAGCACACC AACCGATTAT GGAAGACGAA 
TTATTACAAG GGAAACTTGA GCCAACAAAT GAACCTTATG CTATCGCAAA AATTGCAGGT 
ATTAAATTAT GTGAATCTTA TAACCGTCAG TTTGGGCGTG ATTACCGTTC AGTAATGCCA 
ACCAATCTTT ATGGTCCAAA TGACAATTTT CATCCAAGTA ATTCTCATGT GATTCCGGCG 
CTTTTGCGCC GCTTTCATGA TGCTGTGGAA AACAATTCTC CGAATGTTGT TGTTTGGGGA 
AGTGGTACTC CAAAGCGTGA ATTCTTACAT GTAGATGATA TGGCTTCTGC AAGCATTTAT 
GTCATGGAGA TGC CAT ACGA TATATGGCAA AAAAATACTA AAGTAATGTT GTCTCATATC 
AATATTGGAA CAGGTATTGA CTGCACGATT TGTGAGCTTG CGGAAACAAT AGCAAAAGTT 
GTAGGTTATA AAGGGCATAT TACGTTCGAT ACAACAAAGC CCGATGGAGC CCCTCGAAAA 
CTACTTGATG TAACGCTTCT TCATCAACTA GGTTGGAATC ATAAAATTAC CCTTCACAAG 
GGTCTTGAAA ATACATACAA CTGGTTTCTT GAAAACCAAC TTCAATATCG GGGGTAATAA 
TGTTTTTACA TTCCCAAGAC TTTGCCACAA TTGTAAGGTC TACTCCTCTT ATTTCTATAG 
ATTTGATTGT GGAAAACGAG TTTGGCGAAA TTTTGCTAGG AAAACGAATC AACCGCCCGG 
CACAGGGCTA TTGGTTCGTT CCTGGTGGTA GGGTGTTGAA AGATGAAAAA TTGCAGACAG 
CCTTTGAACG ATTGACAGAA ATTGAACTAG GAATTCGTTT GCCTCTCTCT GTGGGTAAGT 
TTTATGGTAT CTGGCAGCAC TTCTACGAAG ACAATAGTAT GGGGGGAGAC TTTTCAACGC 
ATTATATAGT TATAGCATTC CTTCTTAAAT TACAACCAAA CATTTTGAAA TTACCGAAGT 
CACAACATAA TGCTTATTGC TGGCTATCGC GAG C AAAGCT GATAAATGAT GACGATGTGC 
ATTATAATTG TCGCGCATAT TTTAACAATA AAACAAATGA TGCGATTGGC TTAGATAATA 
AGGATATAAT ATGTCTGATG CGCCAATAAT TGCTGTAGTT ATGGCCGGTG GTACAGGCAG 
TCGTCTTTGG CCACTTTCTC GTGAACTATA TCCAAAGCAG TTTTTACAAC TCTCTGGTGA 
TAACACCTTG TTACAAACGA CTTTGCTACG ACTTTCAGGC CTATCATGTC AAAAACCATT 
AGTGATAACA AATGAACAGC ATCGCTTTGT TGTGGCTGAA CAGTTAAGGG AAATAAATAA 
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ATTAAATGGT AATATTATTC TAGAACCATG . CGGGCGAAAT ACTGCACCAG CAATAGCGAT 
ATCTGCGTTT CATGCGTTAA AACGTAATCC TCAGGAAGAT CCATTGCTTC TAGTTCTTGC 
GGCAGACCAC GTTATAGCTA AAGAAAGTGT TTTCTGTGAT GCTATTAAAA ATGCAACTCC 
CATCGCTAAT CAAGGTAAAA TTGTAACGTT TGGAATTATA CCAGAATATG CTGAAACTGG 
TTATGGGTAT ATTGAGAGAG GTGAACTATC TGTACCGCTT CAAGGGCATG AAAATACTGG 
TTTTTATTAT GTAAATAAGT TTGTCGAAAA GCCTAATCGT GAAACCGCAG AATTGTATAT 
GACTTCTGGT AATCACTATT GGAATAGTGG AATATTCATG TTTAAGGCAT CTGTTTATCT 
TGAGGAATTG AGAAAATTTA GACCTGACAT TTACAATGTT TGTGAACAGG TTGCCTCATC 
CTCATACATT GATCTAGATT TTATTCGATT ATCAAAAGAA CAATTTCAAG ATTGTC CTGC 
TGAATCTATT GATTTTGCTG TAATGGAAAA AACAGAAAAA TGTGTTGTAT GCCCTGTTGA 
TATTGGTTGG AGTGACGTTG GATCTTGGCA ATCGTTATGG GACATTAGTC TAAAATCGAA 
AACAGGAGAT GTATGTAAAG GTGATATATT AACCTATGAT ACTAAGAATA ATTATATCTA 
CTCTGAGTCA GCGTTGGTAG CCGCCATTGG AATTGAAGAT ATGGTTATCG TGCAAACTAA 
AGATGCCGTT CTTGTGTCTA AAAAGAGTGA TGTACAGCAT GTAAAAAAAA TAGTCGAAAT 
GCTTAAATTG CAGCAACGTA CAGAGTATAT TAGTCATCGT GAAGTTTTCC GACCATGGGG 
AAAATTTGAT TCGATTGACC AAGGTGAGCG ATACAAAGTC AAGAAAATTA TTGTGAAACC 
TGGTGAGGGG CTTTCTTTAA GGATGCATCA CCATCGTTCT GAACATTGGA TCGTGCTTTC 
TGGTACAGCA AAAGTAACCC TTGGCGATAA AACTAAACTA GTCACCGCAA ATGAATCGAT 
ATACATTCCC CTTGGCGCAG CGTATAGTCT TGAGAATCCG GGCATAATCC CTCTTAATCT 
TATTGAAGTC AGTTCAGGGG ATTATTTGGG AGAGGATGAT ATTATAAGAC AGAAAGAACG 
TTACAAACAT GAAGATTAAC ATATGAAATC TTTAACCTGC TTTAAAGCCT ATGATATTCG 
CGGGAAATTA GGCGAAGAAC TGAATGAAGA TATTGCCTGG CGCATTGGGC GTGCCTATGG 
CGAATTTCTC AAACCGAAAA CCATTGTTTT AGGCGGTGAT GTCCGCCTCA CCAGCGAAGC 
GTTAAAACTG GCGCTTGCGA AAGGTTTACA GGATGCGGGC GTCGATGTGC TGGATATCGG 
TATGTCCGGC ACCGAAGAGA TCTATTTCGC CACGTTCCAT CTCGGAGTGG ATGGCGGCAT 
CGAAGTTACC GCCAGCCATA ACC CGATGGA TTACAACGGC ATGAAGCTGG TGCGCGAAGG 
GGCTCGCCCG ATCAGCGGTG ATACCGGACT GCGCGATGTC CAGCGTCTGG CAGAAGCCAA 
TGACTTCCCT CCTGTCGATG AAACCAAACG TGGTCGCTAT CAGCAAATCA ATCTGCGTGA 
CGCTTACGTT GATCACCTGT TCGGTTATAT CAACGTCAAA AACCTCACGC CGCTCAAGCT 
GGTGATCAAC TCCGGGAACG GCGCAGCGGG TCCGGTGGTG GACGCCATTG AAGCCCGATT 
TAAAGCCCTC GGCGCACCGG TGGAATTAAT CAAAGTACAC AACACGCCGG ACGGCAATTT 
CCCCAACGGT ATTCCTAACC CGCTGCTGCC GGAATGCCGC GACGACACCC GTAATGCGGT 
CATCAAACAC GGCGCGGATA TGGGCATTGC CTTTGATGGC GATTTTGACC GCTGTTTCCT 
GTTTGACGAA AAAGGGCAGT TTATCGAGGG CTACTACATT GTCGGCCTGC TGGCAGAAGC 
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GTTCCTCGAA AAAAATCCCG GCGCGAAGAT .CATCCACGAT CCACGTCTCT CCTGGAACAC 
CGTTGATGTG GTGACTGCCG CAGGCGGCAC C CCGGTAATG TCGAAAACCG GACACGCCTT 
TATTAAAGAA CGTATGCGCA AGGAAGACGC CATCTACGGT GGCGAAATGA GCGCTCACCA 
TTACTTCCGT GATTTCGCTT ACTGCGACAG CGGCATGATC CCGTGGCTGC TGGTCGCCGA 
ACTGGTGTGC CTGAAAGGAA AAACGCTGGG CGAAATGGTG CGCGACCGGA TGGCGGCGTT 
TCCGGCAAGC GGTGAGATCA ACAGCAAACT GGCGCAACCC GTTGAGGCAA TTAATCGCGT 
GGAACAGCAT TTTAGCCGCG AGGCGCTGGC GGTGGATCGC ACCGATGGCA TCAGCATGAC 
CTTTGCCGAC TGGCGCTTTA ACCTGCGCTC CTCCAACACC GAACCGGTGG TGCGGTTGAA 
TGTGGAATCA CGCGGTGATG TAAAGCTAAT GGAAAAGAAA ACTAAAGCT C TTCTTAAATT 
GCTAAGTGAG TGATTATTTA C ATT AAT CAT TAAGCGTATT TAAGATTATA TTAAAGTAAT 
GTTATTGCGG TATATGATGA ATATGTGGGC TTTTTTATGT ATAACGACTA TACCGCAACT 
TTATCTAGGA AAAGATTAAT AGAAATAAAG TTTTGTACTG ACCAATTTGC ATTTCACGTC 
ACGATTGAGA CGTTCCTTTG CTTAAGACAT TTTTTCATCG CTTATGTAAT AACAAATGTG 
CCTTATATAA AAAGGAGAAC AAAATGGAAC TTAAAATAAT TGAGACAATA GATTTTTATT 
ATCCCTGTTT ACGATATTAT AGCCAAAGTT GTATCCTGCA TCAGTCCTGC AATATTTCAC 
GAGTGCTTTG TTAACTGAAT ACATGTCTGC CATTTTCCAG ATGATAACGA CGTCATCGCA 
ATTGATGGTA AAACACTTCG GCACACTTAT GACAAGAGTC GTCGCAGAGG AGTGGTTCAT 
GTCATTAGTG CGTTTCAGCA ATGCACAGTC TGGTCCTCGG ATAGATCAAG ACGGATGAGA 
AACCTAATGC GTTCACAGTT ATT C ATGAAC TTTCTAAAAT GATGGGTATT AAAGGAAAAA 
TAATCATAAC TGATGCGATG G CTTGCC AGA AAGATATTGC AGAGAAGATA TAAAAACAGA 
GATGTGATTA TTTATTCGCT GTAAAAGGAA ATAAGAGTCG GCTTAATAGA GTCTTTGAGG 
AGATATTTAC GCTGAAAGAA TTAAATAATC CAAAACATGA CAGTTACGCA ATTAGTGAAA 
AGAGGCACGG CAGAGACGAT GTCCGTCTTC ATATTGTTTG AGATGCTCCT GATG AG CTT A 
TTGATTTCAC GTTTGAATGG AAAGGGCTGC AGAATTTATG AATGGCAGTC CACTTTCTCT 
CAATAATAGC AGAGCAAAAG AAAGAATCCG AAATGACGAT CAAATATTAT ATTAGATCTG 
CTGCTTTAAC CGCAGAGAAG TTCGCCACAG TAAATCGAAA TCACTGGCGC ATGGAGAATA 
AGTTGCACAG TAGCCTGATG TGGTAATGAA TGAAATCGAC TATAATATAA GAAGGCGAGT 
TGCATTCGAA TGATTTTCTA GAATGCGGCA CATCGCTATT AATATCTGAC AATGATAATG 
TATTCAAGGC AGGATTATCA TGTAAGATGC GAAAAGCAGT CATGGACAGA AACTTCCTAG 
CGTCAGGCAT TGCAGCGTGC GGGCTTTCAT AATCTTGCAT TGGTTTTGAT AAGATATTTC 
TTTGGAGATG GGAAAATGAA TTTGTATGGT ATTTTTGGTG CTGGAAGTTA TGGTAGAGAA 
ACAATACCCA TTCTAAATCA AGAAATAAAG CAAGAATGTG GTTCTGACTA TGCTCTGGTT 
TTTGTGGATG ATGTTTTGGC AGGAAAGAAA GTTAATGGTT TTGAAGTGCT TTCAACCAAC 
TGCTTTCTAA AAGCCCCTTA TTTAAAAAAG TATTTTAATG TTGCTATTGC TAATGATAAG 
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a. T & c* n A C AG A 


GAGTGTCTGA 


GTCAATATTA 


TTACACGGGG 


TTGAACCAAT 


AAC'TATAAAA 


13440 


f AT C C AAAT A 


GCGTTGTTTA 


TGATCATACT 


ATGATAGGTA 


GTGGCGCTAT 


TATTTCTCCC 


13500 


TTTGTTACAA 


TATCTACTAA 


TACTCATATA 


GGGAGGTTTT 


TTCATGCAAA 


CATATACTCA 


13560 


TACGTTGCAC 


ATGATTGTCA 


AATAGGAGAC 


TATGTTACAT 


TTGCTCCTGG 


GGCTAAATGT 


13620 


AATGGATATG 


TTGTTATTGA 


AGACAATGCA 


TATATAGGCT 


CGGGTGCAGT 


AATTAAGCAG 


13680 


GGTGTTCCTA 


ATCGCCCACT 


TATTATTGGC 


GCGGGAGCCA 


TTATAGGTAT 


GGGGGCTGTT 


13740 


GTCACTAAAA 


GTGTTCCTGC 


CGGTATAACT 


GTGTGCGGAA 


ATCCAGCAAG 


AGAAATGAAA 


13800 


AGATCGCCAA 


CATCTATTTA 


ATGGGAATGC 


GAAAACACGT 


TCCAAATGGG 


ACTAATGTTT 


13860 


AAAATATATA 


TAATTTCGCT 


AATTTACTAA 


ATTATGG CTT 


CTTTTTAAGC 


TATCCTTTAC 


13920 


TTAGTTATTA 


CTGATACAGC 


ATGAAATTTA 


TAATACTCTG 


AT AC ATTTT T 


ATACGTTATT 


13980 


CAAGCCGCAT 


ATCTAGCGGT 


AACCCCTGAC 


AGGAGTAAAC 


AATG 




14024 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12441 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI- SENSE: YES 

(vi) ORIGINAL SOURCE: ^nterica serovar muenchen serogroup C2 

(A) ORGANISM: Salmonella enterica setuvdj. 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
GTTGACAAAT ACCGACCGTA TAATGAATCA AACGTTCTGG ATTGGTATTT ATCCAGGCTT 
GACTACAGAG CATTTAGATT ATGTCGTAAG TAAGTTTGAA GAATTTTTTG GTTTAAATTT 
CTAATTTTTA GGATAGGATG CTTGATGTGA ATAAGAAAAT C CTAATGACT GGCGCTACTA 
GCTTTGTAGG TACCCATCTA CTACATAGTC TCATAAAGGA AGGTTATAGT ATTATTGCAT 
TAAAGCGTCC TATAACCGAG CCAACGATTA TCAATACCTT GATTGAATGG TTGAATATAC 
AAGATATAGA AAAAATATGT CAATCATCTA TGAATATTCA TGCGATTGTC CATATTGCAA 
CAGACTATGG TCGAAACAGA ACCCCTATAT CTGAACAATA TAAATGTAAT GTCCTATTAC 
CAACAAGACT GCTTGAGTTA ATGCCAGCGC TTAAAACGAA ATTCTTTATT TCTACTGACT 
CTTTTTTTGG GAAATATGAG AAGCACTATG GATATATGCG TTCTTACATG GCATCTAAAA 
GACATTTTGT AGAACTATCA AAAATATACG TAGAGGAACA TCCAGACGTT TGTTTTATAA 
ATTTACGTTT AGAACATGTT TACGGTGAGA GGGATAAAGC AGGTAAAATA ATCCCGTATG 
TTATCAAAAA AATGAAAAAC AATGAAGATA TTGATTGTAC GATCGCCAGG CAGAAAAGAG 
ATTTTATTTA TATAGACGAT GTTGTTTCGG CCTATTTGAA AATTTTAAAG GAGGGTTTTA 
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ACGCTGGACA CTAT-GATGTC GAGGTGGGGA .CTGGAAAATC GATAGAGCTA AAAGAAGTGT 840 

TTGAGATAAT AAAAAAAGAA ACGCATAGTA GTAGTAAGAT AAATTATGGT GCAGTTGCGA 900 
TGCGTGATGA TGAGATTATG GAGTCACATG CAAATACCTC TTTCTTGACT CGATTAGGTT 
GGAGTGCCGA GTTTTCTATT GAGAAGGGTG TGAAAAAAAT GTTGAGTATG AAAGAGTAAT 
GAATCGTATT ATTAGAATGT TAGGTGTAGA TAAAGCAATT CGTTATGTTA TTTTTGGTAA 

GATAATATCT GTATTAACGG GTTTACTGTT AATAATGTTA ATATCACACC ATTTATCTAA 1140 

AGACGCACAG GGCTATTATT ATACATTTAA TTCAGTAGTG G C ACT AC AG A TAATATTTGA 1200 

ATTGGGGCTA TCAACGGTAA TCATTCAATT CGCTAGCCAT GAAATGTCAG CGTTAAAATA 1260 

TGATTATTCT GAACGAGATA TTATAGGTGA AAGTAAAAAT AAGCAACGTT ACCTATCGTT 1320 

ATTTCGGTTG GCAATAAAAT GGTATGCAGT AATAGCTTTG CTAATAATAT TAATAGTCGG 1380 

TCCCATCGGG TATGTTTTTT TTACGCAAAA AGAAGGCTTA GGTGTACCTT GGCAAGGGGC 144 0 
ATGGTTATTA TTAACAATAG TTACAG CTTT TAATATTTTT CTTGTTTCTG TACTTTCTGT 
CGCTGAAGGG AGTGGGTTAA TTACTGATGT GAATAAAATG AGAATGTATC AGTCGCTGTT 

AGCTGGTATA TTGGCAGTAA GCTTACTTAT TAGTGGCTTT GGACTATATG CTACGTCTGC 1620 

AATAGCTATT TCAGGGACTA TCATATTCTC CATATTTTCA TATAAGTATT TTAAAAAAAT 1680 

TTTCCTGCAA TCTTTAAAGC ATAAAAATAA ATATACTGAA GGTGGTATTT CATGGGTTAA 1740 

TGAAATATTT CCTATGCAAT GGCGAATTGC TCTAAGTTGG ATGTCAGGGT ATTTTATTTA 1800 

TTTTGTTATG ACCCCCATTG CATTCAAATA TTTCGGGGCT ATATATGCAG GGCAGTTAGG 1860 

GATGTCTTTA ACATTATGCA ATATGGTAAT GGCTACGGGC CTGGCTTGGA TATCCACTAA 1920 

AT ATC CAAAA TGGGGAGTAA TGGTTTCCAA CAAACAGCTT GCGGAACTGA GTAAATCGTT 1980 

CAAAAGTGCA GTAATGCAAT CATCCTTTTT TGTCTTGACA GGATTAACTG GTGTATACAT 2040 

TTCATTATGG TTATTGAAAT TATCTGGTTC AAACATTGGC GAGCGGTTTT TGGGATTGCA 2100 

GGATTTTTTC TTTTTATCTT TAGCAATTAT TGGTAATCAC ATTGTAGCTT GCTTTGCAAC 2160 

CTATATAAGA GCGCATAAAA CTGAAAAAAT GACATTGGCA TCATGTATAA TGGCTCTCTT 2220 

GACTATAACT ACAATGTTGT TTGTTGCATA TTTAGAGTAC TCGAGGTTCT ACATGTTAAT 2280 

GTATGCAGCA CTAACGTGGT TATATTTTGT TCCTCAAACT TATATAATCT TTAAAAGATT 2340 

CAAGAGTTCT TATGAGTAAA AAACCTCTTC TTACTATTGC TATTCCGACA TATAACCGCT 2400 

CTTCATGTTT GGCTCGTTTA CTTGATAGTA TAATTCAACA GGAGAACTAT TGTCATGATG 2460 

AACTCGAGGT TATTGTTTGT GATAATGCTT CAACAGATGA AACAGCAAGA ATAGCCAAGA 2520 

GTGGCTTAGA TAAAATAAGA AATAGTACTT ATCATCTAAA TGAAGAAAAC TTAGGAATGG 2580 

ATGGTAACTT CCAGAAATGT TTTGAGTTAT CAAATGGAAA ATATCTTTGG ATGATTGGCG 2640 

ATGATGATCT AATAGTCAAA AATGGTATTT CGAAGGTTTT TTCGATATTA AAGTCCCGGC 2700 

CTGCATTAGA TATGGTGTAT GTAAATTCAG CAGCAAAGAC TGAGTTAAAC TATAATGCTG 2760 

ATGTGAGGAC GTCATTCTAC ACAAATGATG TAGATTTTAT TTCAGACGTG AAAGTTATGT 2820 
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TCACGTTTAT TTCTGGAATG ATATGTAAGA AAACTGATGC AATTGTCAAA GCCGTTGGTA 
TTTTCAGTCC GCAAACTACT GGAAAATATC TTATGCATTT AACATGGCAA TTGCCATTAC 
TTAAACAGGG TGGAGAGTTC GCAGTTATCC ATAATAATAT AATTGAGGCT GAGCCAGATA 
ATTCAGGTGG ATATCATTTA TATAAGGTTT TTTCTAATAA TCTTGCGACA ATCTTTGATG 
TTTTTTATCC CAGAGAGCAC CGTGTAAGTA AAAGAGTTCG CGCATCAGCA TGTTTATTCT 
TACTTAACTT CATAGGCGAT GAAGATAAAA CCAAAAATTT TGCTACAAAT AATTATTTAA 
GAGATTGCGA TAGTGCATTT ATAGATTTAA TTATATATAA ATATGGGCTT AGGTTTTTCT 
ATCTATATCC TAAAACTGTG CCTTTATTTA GAAAAATAAA ATATATTATA AAGACGGTTT 
TAATGCGGAA ATAAAAATTA TTCAAGATGG TTTGCTGAAA ACGACTTATA GGACTATCTA 
ATGTTTGTCT ATAGTTTAAG ATTAAAATTA AATCTTATCA TATCATTATT GAGTAAAGTT 
AGGCGGAAAT CAAAAGCAAA GTTTCTTGTT CTGCTTAGCG GATATGATTT TAAAATGGTT 
GGGAAGAATT TTAAATTGAA TGTCAAACCT TACTCTGCAA AAAATAACAC CTCTTCCAAA 
TGGGGTAGTA TGCGGGTTGG TGATAACTGC TGGATTGAAG CTGTATATAA TTATGGTGAT 
GAAAAATTTG AACCTTATTT GTACATAGGT GATCGTATAT GTTTAAGTGA TAATGTTCAT 
ATTTCTTGCG TATCATGTTT AATTTTAGAA AACGATATAT TAATTGGTAG CAAAGTTTAT 
ATAGGCGATC ATAGCCATGG CAGTTATAAA GTATGCAGTC CGAAAATAGA ACCGCCAGCA 
AATAAGCCAT TAGGTGATAT TGCTCCTATT AAAATAGGTA ATTGCTGCTG GATTGGAGAT 
AATGCAGTAA TTCTGGCTGG TAGTGAAATT TGTGATGGCT GTGTAATCGC AGCTAATTCA 
GTCGTCAAGG ATTTAAAAGT CGATAAGCCA TGTTTAATTG GTGGGGTTCC TGCTAAAGTA 
ATAAAGGTAT TTTAAAATGA ATGTTTTTAT CAGTATTTGT ATACCGTCTT ATAATAGAGC 
TGAGTTTTTA GAG C C ACT AC TGGATAGCAT ATATAATCAA GATTATTGTT TAAAGAATAA 
TGATTTTGAG GTCATTGTTT GTGAAGATAA ATCTCCACAG AGAGATGAGA TAAACTCTAT 
TATCGAAAAC TATAAAGCAA AAAATAATAA ACAAAATCTT TATGTTAATT TCAATGAAGA 
TAATTTAGGC TATGATAAGA ATTTAAAAAA ATG C ATTAGT TTGACGACAG GTAAATATTG 
CATGATCATG GGCAACGATG ATCTATTAGC AGATGGAGCG TTATCAAAAA TAGTGAAAGT 
TTTGAAGGCT AATCCTGAAA TTGTATTGGC TACGCGAGCG TATGGTTGGT TTAAGGAAAA 
TCCGAATGAG TTATGTGATA CTGTTCGTCA TTTAACAGAC GATACTTTAT TTCAGCCGGG 
GGCTGATGCC ATTAAATTTT TCTTCCGTAG AGTTGGAGTT ATTTCAGGCT TTATTGTCAA 
TGCTGAAAAA GCAAAAAAAC TATCGAGTGA TTTATTTGAT GGGCGTTTAT ATTATCAAAT 
GTACCTTGCT GGTATGCTAA TGGCTGAAGG TCAGGGATAC TATTTTAGCG ACGTGATGAC 
ATTGTCGAGG GATACAGAGG CTCCTGACTT TGGTAACGCT GGAACTGAAA AAGGAGTTTT 
CACCCCGGGG GGGTATAAAC CAGAGGGCCG TATACATATG GTTGAAGGCT TGTTGCTAAT 
TGCAAAATAT ATAGAAGATA CAACAAAAAT TGATGGCGTT TATGCTGGAA TTAGAAAAGA 
CTTAGCGAAC TATTTTTATC CTTATATTCG AGATCAACTC GACTTGCCTC TTTATACTTA 
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TATTAAAATG ATAAATAAAT TTCGGAAAAT _ GGGATTTTCA AATGAAAAGC TTTTCTATGT 
GCATGCCTTT TTAGGGTATG TACTAAAACG GAGGGGCTAT GATGCTTTAA TTAAATACAT 
TCGTAGCAAA AAAGGCGGTA CTCCGCGTCT TGGTATTTAA CCTCCACTTT CAAAAAATGT 
TATGAATATA CTTCTTGCTG CGATATTAGG CGTTAACTTA TTTTCTCCAT ATATTAGTTC 
GTGGATGGTG GGTATGCTGC CATTTCCACC AGGAGCAATC CTAAGGGATG TACT C AATGT 
ATTTTTTGTG GCGTTAGTGC TAGTTCGATT TGTCATTGAT AGGAAAAAAA CTTATTTCCC 
GTTGGTTTTT ACTATTTTTT CATGGTCGGC GGTAATACTA TGGGTAATAG CGTTAACTAT 
ATT CTC ACCG GATAAAATTC AAGCAATTAT GGGGGGGCGG AGTTATATTT TATTCCCGGC 
AGTTTTCATA GCATTAGTGA TTTTAAAAGT ATCATACCCG CAATCCTTAA ATATTGAAAA 
AATAGTTTGC TACATAATTT TTCTAATGTT TATGGTTGCG ACAATATCTA TTATTGATGT 
ACTAATGAAT GGAGAGTTCA TT AAATTG CT CGGATATGAT GAGCATTATG CAGGAGAACA 
ATTAAACTTA ATTAATAGCT ATGATGGGAT GGTCCGGGCT ACAGGCGGTT TTAGTGATGC 
TCTCAATTTT GGATATATGC TCACATTAGG TGTTTTGTTA TGTATGGAGT GTTTTTCCCA 
AGGATATAAA AGATTATTGA TGCTTATTAT TAGTTTTGTG CTATTTATAG CGATCTGCAT 
GAGTCTTACT AGAGGAGCAA TACTTGTTGC TGCGCTTATT TACGCACTTT ATATAATTTC 
AAATCGGAAG ATGCTTTTTT GTGGAATAAC TTTATTTGTA ATAATTATAC CCGTTTTAGC 
AATTTCTACT AATATTTTTG ACAACTATAC AGAAATTTTG ATCGGCAGGT TTACAGATTC 
GTCTCAGGCA TCGCGTGGAT CTACACAGGG GCGGATAGAT ATGGCAATTA ATTCATTAAA 
CTTCCTGTCA GAACATCCAT CAGGTATAGG TCTGGGTACT CAAGGTTCAG GAAACATGCT 
TTCGGTAAAA GATAATAGGT TAAATACGGA TAATTATTTT TTCTGGATCG CCCTTGAGAC 
TGGTATTATT GGCTTAATCA TAAATATTAT TTATCTGGCA AGTCAATTTT ATTCTTCAAC 
TTTACTAAAT AGAATATATG GCAGTCATTG TAGCAATATG CACTATAGAT TATATTTTCT 
CTTTGGAAGT ATATATTTTA TAAGTGCAGC GTTAAGTTCA GCACCTTCGT CATCAACTTT 
TTCTATATAT TATTGGACAG TTTTAGCTTT GATTCCATTT TTAAAATTAA CAAATAGACG 
GTGCACGCGA TAATGAATAA TAAAAAGGTT TTGATGGATA TTAGTTGGTC TAATAAAGGG 
GGGATTGGAC GTTTTACTGA TGAAATTTCT AAACTACTAT GTGATATATC TAAGGAGGAA 
CTATATAGAA AATGTGCTTC TCCGCTGGCC CCATTAGGTT TAGCAGTCAA TATTTTTCTG 
CGAAAGAAAA CTGATGTGGT TTTTCTTCCT GGCTATATTC CACCACTTTT TTGTT CGAAA 
AAGTTCATAA TAACAATACA TGATCTAAAT CATCTGGATT TAAATGATAA TTCCTCTCTT 
TTTAAGAGGT TATTTTATAA TTTTATAATA AAGCGCGGTT GTAGAAAAGC ATATAAAATA 
TTTACAGTTT CGAATTTTTC AAAAGAAAGA ATAGTAGCAT GGTCAGGTGT AAACCCTAAT 
AAAATAGTCA CGGTATATAA TGGGGTATCT AGTCTATTTA ATGC CGATGT AAAACCATTG 
AATTTAGGCT ATAAATATTT GCTATGTGTA GGAAACAGAA AAACTCATAA GAATGAGAAG 
TGTGTTATAT CTGCCTTTGC CAAAGCAGAT ATTGATCCAT CAATAAAACT CGTTTTTACT 
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GGTAATC CTT GTAATGATTT AGAAAAACTA ATAATACAAC ATGGTTTAAG TGAACGTGTA 
AAGTTCTTTG GGTTCGTGTC TGAAAAAGAT TTACCATCGT TATATAAGGG CTCGTTAGGA 
TTAGTTTTCC CTTCTTTATA TGAAGGTTTT GGATTACCTG TAGTGGAGGG CATGGCCTGT 
GGTATTCCTG TATTAACTTC TCTAACTTCA TCATTGCCAG AGGTGGCTGG AGATGCAGCG 
ATTCTTGTCG ACCCTCTTTC GGAAGATGCT ATTACTAAAG GAATTTCGAG GTTAATTAAT 
GATTCTGAAC TTCGTAAGCA TTTAATCCAA AAGGGGCTTT TGCGGGCAAA GAGGTTCAAT 
TGGCAAAACG TGGTTAGTGA GATTGAAATG GTACTGACAG AGGCATGTGA TGGAAATAAA 
TGAAATAAAA AT AT CTCTCG TTCATGAGTG GTTATTAAGT TATGCAGGCT CCGAACAGGT 
ATCATCTGCC ATCCTGCATG TTTTTC CTGA AGCGAAGTTA TATTCGGTGG TTGATTTTCT 
AACGGATGAA CAAAGAAGAC ATTTTCTGGG GAAATATGCG ACTACCACAT TTATTCAAAA 
TTTACCTAAA GCTAAAAAAT TTTACCAGAA ATATTTACCA CTAATGCCAC TGGCTATTGA 
ACAACTTGAT TTATCAGATG CTAATATCAT CATTAGTAGC GCCCATTCCG TTGCAAAAGG 
TGTTATTTCC GGACCAGATC AGCTT C AC AT TAGCTATGTT CATTCTCCTA TTCGATATGC 
GTGGGATTTA CAGCATCAGT ACCTTAATGA GTCTAACCTG AATAAAGGAA TTAAAGGTTG 
GTTAGCAAAA TGGCTTCTTC AC AAAAT AC G AATTTGGGAT TCTCGAACCG CAAATGGGGT 
TGATCATTTT ATAGCTAATT CT CAATAT AT CGCGCGTAGA ATTAAAAAAG TATACAGACG 
TGAGGCTTCA GTTATATATC CGCCTGTAGA TGTGGATAAT TTTGAAGTAA AAAATGAAAA 
GCAAGACTAT TATTTCACAG CATCCCGTAT GGTACCCTAC AAACGTATTG ATCTTATTGT 
CGAAGCCTTT AGTAAAATGC CGGAAAAGAA ATTAGTAGTT ATTGGTGATG GACCGGAGAT 
GAAAAAAATA AAGAGCAAGG CTACAGACAA TATAAAATTG CT CGGTTAT C AATCTTTTCC 
TGTTTTAAAA GAGTATATGC AGAGCGCCAG GGCGTTTGTT TTTGCAGCGG AAGAGGACTT 
TGGAATAATA CCTGTCGAAG CTCAAGCTTG CGGTACCCCT GTTATTGCCT TTGGGAAGGG 
TGGGGCCTTA GAAACCGTTC GCCCACTAGG TGTAGAGGAA CCGACTGGCA TTTTCTTCAA 
GGAACAGAAT ATTGCTTCTT TGCATGAAGC TGTTAGTGAA TTTGAAAAAA ATGCATCATT 
TTTTACATCT CAGGCTTGTA GAAAAAATGC AGAAAAATTT TCTCGATCAA GATTTGAACA 
AGAATTTAAG AACTTTGTTA ATGAAAAGTG GAATCTTTTC AAAACAGAAC AGATTATTAA 
ACGTTAATTA TGGTTTATTG AATGTCTAAA TTAATACCAG TAATAATGGC CGGTGGGATT 
GGTAGCCGTT TGTGGCCACT TTCACGTGAA GAGCATCCGA AACAGTTTTT AAGCGTAGAT 
GGTGAATTAT CTATGCTGCA AAAC AC C ATT AAAAGATTGA CTCCTCTTTT GGCTGGAGAA 
CCTTTAGTCA TTTGTAATGA TAGTCACCGC TTCCTTGTCG CTGAACAACT TCGAGCTATA 
AATAAACTAG CAAATAACAT CATATTAGAG CCAGTGGGGC GTAATACAGC CCCAGCTATA 
GCGCTGGCCG CTTTTTGTTC ACTTCAGAAT GTCGTCGATG AAGACCCGCT TTTGCTTGTC 
CTTGCTGCGG ATCATGTCAT CCGCGATGAG AAAGTGTTTC TTAAAGCTAT CAATCACGCT 
GAATTTTTTG CAACACAAGG TAAGCTAGTA ACGTTTGGTA TTGTACCCAC ACAGGCCGAA 
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ACTGGCTACG GTTATATTTG TAGAGGTGAA GCAATCGGGG AAGATGCTTT TTCTGTAGCC 
GAATTTGTAG AGAAGCCTGA TTTCGATACA GCGCGTCATT ATGTAGAATC AGAGAAATAT 
TATTGGAACA GCGGTATGTT CCTATTTCGT GCAAGTAGTT ACTTACAAGA ATTAAAGGAT 
CTGTCCCCCG ATATTTACCA AGCATGTGAA AATGCGGTAG GGAGTATTAA TC CTG ATCTT 
GATTTTATCC GTATTGATAA AG AAG CATTC GCAATGTGCC CTAGTGATTC TATCGATTAT 
GCGGTAATGG AACATACTAG GCATG CAGTT GTCGTACCGA TGAATGCCGG CTGGTCAGAT 
GTGGGGTCAT GGTCTTCACT GTGGGATATT TCTAAGAAAG ATCCACAACG TAATGTATTA 
CATGGCGATA TTTTTGCATA TAATAGTAAA GATAATTATA TCTATTCTGA AAAATCGTTT 
ATTAGTACAA TCGGAGTAAA TAATTTAGTT ATCGTGCAGA CAGCAGATGC ATTATTAGTA 
TCTGATAAAG ATTCAGTCCA GGATGTTAAA AAAGTTGTTG ATTATTTAAA AGCTAATAAT 
AGAAACGAAC ATAAAAAACA TTTAGAGGTT TTCCGACCGT GGGGAAAATT TAGCGTAATT 
CATAGTGGCG ATAATTATTT AGTTAAAAGA ATAACTGTTA AACCAGGCGC GAAGTTTGCT 
GCTCAGATGC ATCTCCATCG TGCTGAGCAT TGGATAGTGG TATCTGGTAC TGCTTGTATT 
ACTAAGGGGG AAGAAATTTT TACAATTTCG GAGAATGAAT CAACATTTAT ACCTGCTAAT 
AC AGTT CAT A CGTTAAAAAA CCCCGCGACT ATT CC ATT AG AACTAATAGA AATTCAATCT 
GGCACCTATC TTGCGGAGGA TGATATTATT CGCCTGGAGA AACATTCTGG ATATCTGGAG 
TAATGAATTG ATGAAAAATA TATATAATAC TTACGATGTT ATCAACAAAT CTGGAATTAA 
TTTTGGAACC AGTGGTGCCC GCGGCCTTGT TACCGATTTT AC AC CCG AAG TTTGCGCACG 
ATTTACCATT TC CTTTTTGA CAGTAATGCA GCAAAGATTC TCATTTACAA CGGTTGCGCT 
CGCAATTGAT AATCGTCCAA GCAGTTACGC GATGGCTGAA GCTTGTGCCG CTGCTTTGCA 
AGAAAAAGGA ATTAAAACCG TTTACTATGG CGTAATTCCA ACACCTGCTT TAGCTCATCA 
ATCAATTTCC GATAAAGTAC CTGCAATCAT GGTTACTGGC AGTCATATCC CTTTTGACCG 
TAATGGCCTG AAATTTTATA GACCAGATGG TGAAATTACT AAAGATGATG AGAATGCTAT 
TATTCATGTT GATGCCTCAT TTATGCAGCC TAAGCTTGAA CAATTGACAA TTTCCACAAT 
CGCTGCTAGA AATTATATTC TACGATATAC CTCATTATTT CCAATGCCAT TCTTGAAAAA 
TAAGCGCATT GGAATTTATG AGCATTCTAG TGCGGGTCGT GATCTCTATA AGACGTTATT 
CAAAATGTTG GGTGCTACAG TTGTTAGTTT AGCAAGGAGC GACGAATTTG TTCCTATTGA 
TACTGAAGCT GTAAGTGAAG ATGATAGAAA TAAAGCAATC ACATGGGCAA AAAAATATCA 
GTTAGATGCT ATATTTTCAA CTGATGGTGA TGGAGATCGC CCT CTGATAG CTGACGAATA 
TGGAAATTGG TTAAGAGGAG ATATATTAGG CCTTCTGTGC TCTCTCGAAT TAGCTGCTGA 
TGCAGTCGCT ATTCCTGTAA GCTGCAACAG TACAATCTCA TCTGGTAACT TTTTTAAACA 
TGTGGAACGA ACAAAGATTG GTTCACCCTA TGTGATTGCA GCATTTGCTA AATTATCTGC 
AAACTATAAT TGTATAGCTG GTTTTGAAGC GAATGGTGGC TTTCTGCTAG GTAGCGATGT 
TTATATTAAT CAGCGTTTAC TTAAGGCATT ACCAACACGT GATGCTTTAT TACCTGCCAT 
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TATGCTTCTG TTTGGTAGCA AGG AC AAAAG _ TATTAGTGAG CTTGTTAAAA AACTTCCTGC 
TCGCTATACC TATTCAAACA GATTACAGGA TATAAGTGTT AAAACAAGTA TGTCTTTAAT 
AAATCTTGGT CTGACAGATC AAGAGGATTT TTTGCAGTAT ATTGGTTTTA ATAAACATCA 
TATATTACAT TCTGATGTTA CTGATGGCTT TAGAATCACT ATCGATAACA ACAATATTAT 
TCATTTACGA CCTTCAGGCA ATGCCCCTGA GTTGCGTTGC TATGCGGAGG CTGACTCGCA 
AGAGGATGCA TGTAATATTG TTGAAACTGT TCTCTCTAAT ATCAAAAGCA AACTGGGTAG 
AGCTTAATGC TGTTGATAAT AGAGCGTTTC TTTCCAGTAA TACTTTGTCT GGTTATCTGG 
TACCCAAGTT GAGGGTGAGA ATTAAATGGA TCGTTTTGAT AATAAGTATA ACCCAAATTT 
ATGCAAAATA TTATTGGCTA TATCAGATTT ACTGTTTTTT AATGTAGCCT TATGGGCATC 
GTTAGGAGTT GTATATTTAA TCTTTGATGA AGTTCAGCGA TTTGTACCAC AAGAGCAATT 
AGATAATCGA TTTATATCAC ATTTTATTCT ATCTATAGTA TGCGTTGGAT GGTTTTGGGT 
TCGACTGCGT CACTATACAT ATCGAAAGCC ATTCTGGTAT GAGTTGAAAG AGGTTATTCG 
TACTATCGTT ATTTTTGCTG TGTTTGATTT GGCTTTAATT GCGTTTACAA AATGGCAGTT 
TTCACGCTAT GTCTGGGTGT TTTGTTGGAC TTTTGCCATA ATCCTGGTGC CTTTTTTTCG 
CGCACTTACA AAGCATTTAT TGAACAAGCT AGGTATCTGG AAGAAAAAAA CTATCATCCT 
TGGGAGCGGA CAGAATGCTC GTGGTGCATA TTCTGCGCTG CAAAGTGAGG AGATGATGGG 
GTTTGATGTT ATCGCTTTTT TTGATACGGA TGCGTCAGAT GCTGAAATAA ATATGTTGCC 
GGTGATAAAG GACACTGAGA CTATTTGGGA TTTAAATCGT ACAGGTGATG TCCATTATAT 
CCTTGCTTAT GAATACACCG AGTTGGAGAA AACACATTTT TGGCTACGTG AACTTTCAAA 
ACATCATTGT CGTTCTGTTA CTGTCGTCCC CTCGTTTAGA GGATTGCCAT TATATAATAC 
TGATATGTCT TTTATCTTTA GCCATGAAGT TATGTTATTA AGGATACAAA ATAACTTGGC 
TAAAAGGTCG TCCCGTTTTC TCAAACGGAC ATTTGATATT GTTTGTTCAA TAATGATTCT 
TATAATTGCA TCACCACTTA TGATTTATCT GTGGTATAAA GTTACTCGAG ATGGTGGTCC 
GGCTATTTAT GGTCACCAGC GAGTAGGTCG GCATGGAAAA CTTTTTCCAT GCTACAAATT 
TCGTTCTATG GTTATGAATT C 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22080 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNE S S : doubl e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
<iv) ANTI- SENSE: YES 

(V±) °U? I Sk£5^S. enterica serovar typfcimuriun, (serogroup B) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4: 
GAATTCGGGA GGCGCAATGA AAGTCAGCTT TTTTCTGCTG AAATTT C C AC TCTCATCGGA 
AACCTTTGTG CTGAATCAGA TTACTGCGTT TATTGATATG GGCCATGAGG TGGAGATTGT 
CGCGTTACAA AAAGGCGATA CCCAACATAC TCACGCCGCC TGGGAGAAGT ATGGCCTGGC 
GGCGAAAACC CGCTGGTTAC AGGATGAGCC CCAGGGACGG CTGGCGAAAC TGCGCTACCG 
GGCATGTAAA ACGCTGCCGG GGCTGCATCG GGCGGCGACC TGGAAAG CGC TCAATTTTAC 
CCGCTATGGC GATGAATCAC GCAATTTGAT CCTTTCCGCG ATTTGCGCGC AGGTGAGCCA 
GCCTTTTGTG GCGGATGTGT TTATCGCACA CTTTGGTCCG GCGGGCGTGA CGGCGGCCAA 
ACTACGCGAA CTGGGCGTGC TTCGCGGCAA AATCGCGACT ATTTTCCACG GGATTGATAT 
CTCTAGTCGT GAGGTGCTCA GTCATTACAC GCCGGAGTAT CAGCAGTTGT TTCGTCGTGG 
CGATCTGATG CTGCCCATCA GCGATCTGTG GGCCGGTCGC CTGAAAAGTA TGGGCTGTCC 
GCCGGAAAAG ATTGCCGTTT CGCGCATGGG CGTCGACATG ACGCGTTTTA CCCATCGTTC 
GGTGAAAGCG CCAGGGATGC CGCTGGAGAT GATTTCCGTC GCGCGCCTGA CAGAAAAAAA 
AGGCCTGCAT GTGGCGATTG AAGCCTGTCG GCAACTGAAA GCACAGGGCG TGGCGTTTCG 
CTACCGCATT CTGGGGATTG GCCCGTGGGA ACGTCGGCTG CGCACGCTCA TCGAGCAGTA 
TCAGCTAGAG GATGTCATTG AGATGCCGGG GTTTAAACCG AGCCATGAAG TGAAGGCGAT 
GCTGGATGAC GCCGATGTTT TTTTGCTGCC GTCGATTACC GGTACGGATG GCGATATGGA 
AGGTATTCCG GTAGCGCTGA TGGAGGCGAT GGCGGTAGGG ATTCCCGTGG TATCTACCGT 
GCATAGCGGT ATTCCGGAAC TGGTGGAGGC CGGCAAATCC GGCTGGCTGG TGCCGGAAAA 
CGATGCGCAG GCGCTGGCGG CCCGACTCGC TGAGTTCAGC CGGATTGACC ACGACACGCT 
GGAGTCGGTG ATCACGCGCG CCCGTGAAAA AGTGGCGCAA GATTTTAATC AGCAGGCGAT 
TAATCGCCAG TTAGCCAGCC TGCTACAAAC GATATAAACG AGGTGGTATG CCCGCGACTA 
AATTCTCCCG ACGTACCCTC CTGACGGCAG GTT CTGCGCT TGCTGTTCTT CCTTTTCTGC 
GCGCCTTGCC GGTACAGGCG CGTGAACCTC GCGAGACCGT CGATATTAAG GATTATCCGG 
CGGATGACGG TATCGCCTCG TTCAAACAGG CCTTCGCCGA CGGACAGACC GTGGTCGTAC 
CGCCAGGATG GGTGTGTGAA AAT AT CAATG CGGCGATAAC GATTCCGGCG GGAAAAACGC 
TGCGGGTACA GGGCGCGGTG CGTGGGAATG GCCGGGGACG GTTTATTTTG CAGGACGGGT 
GTCAGGTGGT GGGGGAGCAG GGCGGCAGTC TGCACAATGT GACGCTGGAT GTTCGCGGGT 
CGGACTGTGT GATTAAAGGC GTGGCGATGA GCGGCTTTGG CCCCGTCGCG CAAATTTTCA 
TCGGTGGTAA GGAACCGCAG GTGATGCGTA ATCTCATTAT CGATGACATC ACCGTTACCC 
ACGCCAACTA CGCCATTCTC CGCCAGGGAT TTCATAACCA AATGGATGGC GCGCGGATTA 
CGCATAGCCG CTTTAGCGAT TTACAGGGGG ACGCCATTGA GTGGAATGTC GCGATTCACG 
ACCGCGACAT CCTGATTTCC GATCATGTCA TCGAACGCAT TAATTGTACC AATGGCAAAA 
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TCAACTGGGG GATCGGCATC GGGCTGGCGG GTAGCACCTA TGACAACAGT TATCCTGAAG 
ACCAGGCAGT AAAAAACTTT GTGGTGGCCA ATATTAC CGG ATCTGATTGC CGACAGCTTG 
TGCACGTAGA AAATGGCAAA CATTTCGTCA TTCGCAATGT CAAAGCCAAA AACATCACGC 
CCGGTTTCAG TAAAAATGCG GGTATTGATA ACGCAACGAT CGCAATTTAT GGCTGTGATA 
ATTTCGTCAT TGATAATATT GATATGACGA AT AGTGC CGG GATGCTCATC GGCTATGGCG 
TCGTTAAAGG AAAATACCTG TCAATTCCGC AAAACTTTAA ATTAAACGCT ATTCGGTTGG 
ATAATCGCCA GGTTGCTTAT AAATTACGCG GCATTCAAAT TTCCTCCGGC AACACCCCCT 
CTTTTGT CGC CAT C AC C AAT GTACGGATGA CGCGTGCTAC GCTGGAACTG CATAATCAAC 
CGCAGCACCT CTTTCTGCGC AATATCAACG TGATGCAAAC TTC AG CG ATT GGCCCGGCGT 
TAAAAATGCA TTTCGATTTG CGTAAAGATG TACGTGGTCA ATTTATGGCC CGCCAGGACA 
CGCTGCTTTC CCTCGCTAAT GTTCATGCCA TCAATGAAAA CGGGCAGAGT TCCGTGGATA 
TCGACAGGAT TAATCACCAA ACCGTGAATG TCGAAGCAGT GAATTTTTCG CTGCCGAAGC 
GGGGAGGGTA AGTACCGCTA TTTTTACGAA AATTCCTGGG AAAAAGTTGT TCATACTTAA 
TGTTATGGTG CCGACTAAGA CGTAATGTAG AGCGTGCCAT CATTATCCCT GGCAGCAGAG 
TAATTCATGC TGGCGAAAAC AAGCTAAAGA GCTATAATTC AGCAACCATT TTACAGGTGG 
AAGAAACAAT GATGAATTTG AAAGCAGTTA TACCGGTAGC GGGTTTGGGT ATGCATATGT 
TGCCTGCCAC CAAGGCAATC C C AAAAGAGA TGCTACCGAT CGTCGACAAG CCAATGATTC 
AGTACATTGT CGATGAGATT GTGGCTGCAG GGATCAAAGA AATCGTGCTG GTGACTCACG 
CGTCTAAAAA CGCCGTTGAG AACCACTTCG ACACCTCTTA TGAACTTGAA TCACTTCTTG 
AGCAGCGCGT TAAGCGTCAG CTTTTGGCGG AAGTGCAATC TATCTGCCCA CCGGGCGTGA 
CGATTATGAA CGTTCGCCAG GCGCAGCCGT TAGGGCTGGG GCATTCTATT CTGTGCGCGC 
GTCCGGTCGT GGGCGATAAC CCTTTCATTG TGGTACTCCC GGATATTATT ATCGATGATG 
CTACCGCCGA TCCGCTGCGC TATAAC CTTG CGGCGATGGT GGCGCGTTTC AATGAAACGG 
GTCGCAGCCA GGTGCTGGCG AAGCGCATGA AAGGTGATTT ATCGGAGTAT TCCGTTATCC 
AGACGAAAGA ACCTCTGGAT AATGAAGGCA AAGTCAGCCG GATTGTGGAG TTTATCGAAA 
AACCGGATCA GCCGCAGACG CTGGATTCCG ATTTGATGGC GGTAGGCCGT TATGTGCTTT 
CAGCCGACAT CTGGGCGGAA CTGGAAAGAA CCGAACCGGG CGCCTGGGGC CGCATCCAGC 
TCACCGATGC CATTGCTGAA CTGGCGAAAA AACAGTCGGT TGACGCGATG CTAATGACGG 
GTGACAGCTA TGACTGCGGT AAAAAAATGG GCTACATGCA GGCATTTGTG AAGTACGGG C 
TGCGCAACCT GAAAGAAGGA GCCAAGTTCC GTAAGAGCAT AGAGCAGCTT TTGCATGAAT 
AAGTATTAAC AACCGTGATA AATGGTTGGT GATAAACATA ATAACGGCAG TGAACATTCG 
AAGCGGCAAG TTGGCTGAAA CGAGTGTTGA CTGCCGTTTT AGTTTTGTAT AAAGGGCTTA 
AGTAACAAGG GGTTATCTGG AGCATTTTAA TGCTGATTTT ATAAGATTAA TCCTTGTTTC 
CGGATGCAAT TAATAAGACA ATTAGCGTTT AAGTTTTAGT GAGCTTTGCC CTGCTGGGCG 
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AGGTTTGCAA CAAGTCGATA TGTACGCAGT GCACTGGTAG CTGATGAGCC AGGGGCGGTA 
GCGTGTGTAA CGACTTGAGC AATTAATTTT TATTGGCAAA TTAAATACCA CATTAAATAC 
GCCTTATGGA ATAGAAAAGT GAAGATACTT ATTACTGGCG GGGCAGGTTT TATTGGATCA 
GCTGTTGTCC GCCATATTAT TAAGAATACA CAGGACACTG TAGTTAATAT TGATAAATTA 
A CCTACGCCG GTAATCTTGA ATCCCTTTCT GATATTTCTG AAAGTAATCG CTACAATTTT 
GAACACGCGG ATATTTGTGA TTCCGCTGAA ATAACGCGTA TTTTTGAGCA GTACCAGCCG 
GACGCGGTGA TGCATTTGGC TGCGGAAAGT CATGTGGACC GTTCGATTAC CGGGCCAGCA 
GCATTTATTG AAACCAATAT CGTCGGCACC TATGCACTTC TTGAAGTTG C GCGTAAATAC 
XGGT CTGCCC TTGGCGAAGA TAAAAAAAAT AATTTTCGTT TTCATCATAT TTCCACTGAT 
GAAGTTTACG GCGATTTACC GCATCCTGAT GAAGTTGAAA ACAGCGTTAC GCTGCCGTTA 
TTTACTGAAA CGACGGCATA TGCGCCAAGT AGCCCCTATT CTGCGTCAAA AGCATCCAGC 
GATCATTTAG TCCGTGCCTG GCGGCGTACC TATGGTCTAC CAACGATCGT TACCAATTGT 
TCTAATAACT ATGGCCCTTA TCACTTCCCT GAAAAACTGA TTCCGTTGGT CATTTTGAAC 
GCACTGGAAG GAAAGCCTTT GC C AATTT AT GGCAAAGGGG ATCAG ATT CG CGATTGGCTA 
TATGTAGAAG ATCATGCTCG CGCGCTTCAT ATGGTAGTGA CTGAAGGCAA GGCAGGGGAG 
ACTTATAACA TTGGTGGACA CAATGAGAAG AAAAATCTCG ATGTGGTATT TACCATCTGT 
GATCTGCTGG ATGAGATTGT ACCCAAAGCG ACTTCTTATC GTGAACAAAT CACTTATGTC 
GCGGATCGTC CGGGCCATGA TCGTCGTTAT GCCATTGATG CAGGTAAAAT TAGCCGCGAA 
TTAGGCTGGA AACCGCTGGA GACCTTTGAA AGCGGTATTC GTAAAACAGT GGAATGGTAC 
CTTGCAAATA CTCAATGGGT AAACAATGTT AAAAGTGGGG CGTATCAGAG TTGGATAGAA 
CAGAACTATG AAGGACGCCA GTAATGAATA TCTTACTTTT TGGTAAGACA GGGCAAGTAG 
GCTGGGAGTT GCAACGTTCT CTGGCACCGG TAGGGAATCT GATTGCCCTG GATGTCCATT 
CAAAAGAGTT TTGCGGTGAT TTTAGTAATC CGAAAGGCGT TGCCGAAACC GTTCGTAAGC 
TTCGTCC CGA TGTGATTGTT AACGCAGCAG CCCATACTGC AGTAGATAAA GCAGAGTCTG 
AACCAGAACT GGCGCAGTTA CTTAACGCCA CCAGTGTGGA AGCCATCGCT AAAGCAGCCA 
ACGAAACTGG CGCATGGGTA GTGCATTATT CAACCGATTA TGTATTTCCT GGTACCGGCG 
ATATCCCATG GCAGGAAACG GACGCTACGT CGCCGCTGAA TGTCTATGGC AAAACCAAAC 
TGGCGGGAGA AAAGGCCCTG CAGGATAACT GCCCTAAACA CCTTATCTTC CGCACCAGTT 
GGGTTTATGC AGGTAAGGGC AATAATTTCG CAAAGACAAT GCTTCGTCTG GCGAAAGAGC 
GTCAGACACT TTCAGTCATT AACGATCAGT ACGGTGCGCC AACCGGTGCG GAATTACTGG 
CTGACTGTAC GGCGCATGCG ATCCGTGTGG CGTTAAATAA ACCAGAAGTC GCAGGTCTTT 
ACCATCTGGT TGCCGGGGGA ACCACAACCT GGCATGACTA CGCGGCCTTA GTCTTTGACG 
AGGCGCGCAA AGCAGGGATA ACGCTTGCGC TGACTGAGCT TAATGCTGTG CCGACCAGCG 
CCTACCCGAC GCCGGCGAGC AGACCAGGCA ATTCGCGTCT CAATACTGAA AAGTTTCAGC 
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GTAATTTTGA CCTTATTCTG CCTCAATGGG .AATTAGGAGT TAAGCGTATG CTGACTGAAA 
TGTTTACGAC GACAACCATC TAATAAATTT AAATGCCCAT CAGGGCATTT TCTATGAATG 
AGAAATGGAA ATGAAAACGC GTAAGGGCAT TATTTTAGCG GGGGGCTCCG GCACCCGTCT 
TTATCCGGTG ACCATGGCGG TAAGTAAGCA ATTGCTACCA ATTTATGATA AACCGATGAT 
TTACTATCCC CTTTCCACGC TTATGCTGGC AGGCATTCGG GATATCCTGA TCATCAGTAC 
GCCACAGGAC ACGCCGCGTT TTCAACAACT GCTGGGAGAC GGCAGCCAGT GGGGGCTGAA 
TCTTCAATAT AAAGTACAGC CAAGCCCGGA TGGCTTAGCA CAGGCGTTTA TTATTGGTGA 
AGAGTTCATT GGTCATGATG ATTGTGCATT AGTGCTGGGT GACAATATCT TCTATGGTCA 
TGATTTACCA AAGTTAATGG AAGCTGCCGT TAATAAAGAA AGTGGTGCTA CCGTCTTCGC 
TTATCATGTA AACGATCCGG AGCGCTACGG TGTGGTTGAG TTTGACCAAA AGGGCACAGC 
CGTTAGTCTG GAAGAAAAAC CATTACAACC GAAGAGTAAT TACGCGGTAA CGGGGCTGTA 
TTTTTATGAT AATAGCGTGG TGGAGATGGC GAAAAATCTT AAGCCTTCCG CTCGCGGTGA 
GTTAGAAATC ACGGATATTA ACCGTATCTA T ATGG AG C AG GGAAGATTGT CTGTCGCTAT 
GATGGGGCGC GGTTATGCCT GGCTGGATAC AGGGACGCAT CAGAGTTTGA TAGAGGCCAG 
TAATTTTATT GCAACCATCG AAGAACGCCA GGGGCTAAAA GTGTCCTGCC CGGAAGAGAT 
CGCATTTCGT AAAAATTTTA TAAATGCACA ACAGGTTATA GAACTGGCCG GGCCATTATC 
AAAAAATGAT TATGGCAAAT ATTTGCTGAA GATGGTGAAA GGTTTATAAG TGATGATTGT 
GATTAAAACA GCAATACCAG ATGTCTTGAT CTTAGAGCCT AAAGTTTTTG GCGATGAGAG 
GGGATTCTTT TTTGAAAGTT ATAACCAGCA GACCTTTGAA GAGTTGATTG GACGTAAAGT 
TACATTTGTT CAAGATAATC ATTCAAAATC CAAAAAGAAC GTACTCAGAG GGCTACATTT 
TCAGAGAGGA GAAAATGCAC AGGGGAAGTT AGTTCGTTGT GCTGTCGGTG AGGTTTTTGA 
TGTTGCGGTC GATATCCGAA AAGAATCGCC TACTTTTGGT CAATGGGTTG GTGTAAATCT 
GTCTGCTGAG AATAAGCGAC AGCTTTGGAT TCCAGAAGGT TTTGCTCATG GTTTTGTTAC 
TCTTAGTGAG TATGCAGAGT TTCTGTACAA AGCAACTAAT TATTACTCAC CTTCATCGGA 
AGGTAGCATT CTATGGAATG ATGAGGCAAT AGGTATTGAA TGGCCTTTTT CTCAGCTGCC 
TGAGCTTTCA GCAAAAGATG CTGCAGCACC TTTACTGGAT CAAGCCTTGT TAACAGAGTA 
AGCATCGTGT CTCATATTAT TAAGATTTTT CCATCAAATA TTGAATTTTC CGGTAGAGAG 
GATGAATCAA TCCTCGATGC TGCGCTATCG GCTGGTATCC ATCTTGAACA TAGCTGCAAA 
GCGGGTGATT GTGGTATCTG TGAGTCCGAT TTGTTGGCGG GAGAAGTTGT TGACTCCAAA 
GGTAATATTT TTGGACAGGG TGATAAAATA CTAACCTGCT GCTGTAAACC TAAAACCGCC 
CTTGAGCTAA ATGCGCATTT TTTTCCTGAA CTAGCTGGAC AGACAAAAAA AATTGTCCCA 
TGCAAGGTAA ATAGTGCTGT ACTGGTTTCA GGCGATGTTA TGACTTTGAA GTTACGCACA 
CCACCAACAG CAAAAATTGG CTTCCTTCCA GGGCAGTATA TCAATTTACA TTATAAAGGT 
GTAACTCGCA GTTATTCTAT CGCTAATAGT GATGAGTCGA ATGGTATTGA GTTGCATGTA 
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AGGAATGTTC CCAATGGTCA GATGAGTTCG _ CTCATTTTTG GGGAGTTACA AGAAAATACT 
CTTATGCGCA TTGAAGGGC C TTGCGGAACA TTTTTTATTC GTGAAAGTGA CAGACCTATA 
ATCTTCCTTG CAGGCGGTAC TGGATTCGCT CCAGTXAAAT CAATGGTTGA GCAT CTC ATT 
CAGGGAAAAT GTCGTCGTGA GATCTACATT TACTGGGGAA TGCAATATAG TAAAGATTTT 
TACTCTGCAT TAC CGC AGC A GTGGAGTGAA CAGCACGACA ACGTTCATTA TATCCCTGTT 
GTTTCTGGTG ATGACGCCGA ATGGGGGGGA AGAAAGGGAT TTGTCCATCA TGCCGTGATG 
GATGATTTTG ATTCTCTAGA GTTCTTCGAT ATATATGCAT GTGGTTCACC TGTG ATGAT C 
GATGCCAGTA AAAAGGACTT TATGATGAAA AATCTCTCTG TAGAACATTT CTATT CTG AT 
GCATTTACCG CATCTAATAA TATTGAGGAT AATTTATGAA AGCGGTCATC CTGGCTGGTG 
GACTTGGTAC CAGACTAAGT GAAGAAACAA TTGTAAAACC AAAACCGATG GTAGAAATTG 
GTGGCAAGCC TATTCTTTGG CACATTATGA AAATGTATTC TGTGCATGGT AT CAAGGATT 
TTATTATCTG CTGTGGTTAT AAAGGATATG TGATTAAAGA ATATTTTGCG AACTACTTCC 
TTCACATGTC AGATGTAACA TTCCATATGG CTGAAAACCG TATGGAAGTT CACCATAAAC 
GTGTTGAACC ATGGAATGTC ACATTGGTTG ATACGGGTGA TTCTTCAATG ACTGGTGGTC 
GTCTGAAACG TGTTGCTGAA TACGTAAAAG ATGACGAGGC TTTCCTGTTT ACTTATGGTG 
ATGGCGTTGC CGAC CTTG AT ATCAAAGCGA CTATCGATTT CCATAAGGCT CACGGTAAGA 
AAGCGACTTT AACAGCTACT TTTCCACCAG GACGCTTTGG CGCATTAGAT ATCCGAGCTG 
GTCAGGTCCG GTC ATTC C AG GAAAAACCGA AAGGCGATGG GGCAATGATC AATGGTGGTT 
TCTTTGTGTT GAATCCATCG GTTATCGATC TCATCGATAA CGATGCAACA AC CTGGGAAC 
AAGAGCCATT AATGACATTG GCACAACAGG GGGAGTTAAT GGCTTTTGAA CACCCAGGTT 
TCTGGCAGCC GATGGATACC CTACGTGATA AAGTTTACCT CGAAGGGCTG TGGGAAAAAG 
GTAAAGCTCC GTGGAAAACC TGGGAGTAAC TAGATGATTG ATAAAAATTT TTGGCAAGGT 
AAACGTGTAT TCGTTACCGG CCATACTGGC TTTAAAGGAA GCTGGCTTTC GCTATGGCTG 
ACTGAAATGG GTGCAATTGT AAAAGGCTAT GCACTTGATG CGCCAACTGT TCCAAGTTTA 
TTTGAGATAG TGCGTCTTAA TGATCTTATG GAATCTCATA TTGGCGACAT TCGTGATTTT 
GAAAAGCTGC GCAATTCTAT TGCAGAATTT AAGCCAGAAA TTGTTTTCCA TATGGCAGCC 
CAGCCTTTAG TGCGCCTATC TTATGAACAG CCAATCGAAA CATACTCAAC AAATGTTATG 
GGTACTGTCC ATTTGCTTGA AACAGTTAAG CAAGTAGGTA ACATAAAGGC AGTCGTAAAT 
ATCACCAGTG ATAAGTGCTA CGACAATCGT GAGTGGGTGT GGGGCTATCG TGAGAACGAA 
CCCATGGGAG GGTACGATCC ATACTCTAAT AGTAAAGGTT GTGCAGAATT AGTCGCGTCT 
GC ATTC CGG A ACTCATTCTT CAATCCTGCA AATTATGAGC AACATGGCGT TGGTTTGGCG 
TCTGTGAGGG CTGGTAATGT CATAGGCGGA GGCGATTGGG CTAAAGACCG TTTAATTCCC 
GATATTCTGC GCTCATTTGA AAATAACCAG CAGGTTATTA TTCGAAACCC ATATTCTATC 
CGTCCCTGGC AGCATGTACT GGAGCCTCTT TCTGGTTACA TTGTGGTGGC GCAACGCTTA 
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TATACAGAAG GTGCTAAGTT TTCTGAAGGA TGGAATTTCG GCCCGCGTGA TGAAGATGCG 
AAGACGGTCG AATTTATTGT TGACAAGATG GTCACGCTTT GGGGTGATGA TGCAAGCTGG 
TTACTGGATG GTGAGAATCA TCCTCATGAG GCACATTACC TGAAACTGGA TTGCTCTAAA 
GCAAATATGC AATTAGGATG GCATCCGCGT TGGGGATTGA CTGAAACACT TGGTCGCATC 
GTAAAATGGC ATAAAGCATG GATTCGCGGC GAAGATATGT TGATTTGTTC AAAGCGTGAA 
ATCAGCGACT ATATGTCTGC AACTACTCGT TAAGAAAATA AGTTTAAGGA AT C AAAGTAA 
TGACAGCAAA TAAC CTGCGT GAGCAAATCT CTCAGCTTGT CGCTCAGTAT GCGAATGAGG 
CATTG AGC CC GAAACCTTTT GTTGCAGGTA CAAGCGTTGT GCCTCCTTCC GGGAAGGTTA 
TTGGTGCCAA AGAGTTACAA TTGATGGTTG AGGCGTCTCT TGATGGATGG CTAACTACTG 
GTCGTTTCAA TGATGCCTTT GAAAAAAAAC TTGGGGAATT TATTGGGGTT CCTCATGTTT 
TAACGACAAC ATCTGGCTCT TCGGCAAACT TGCTGGCACT GACTGCGCTG ACTTCCCCAA 
AATTAGGCGA GCGAGCTCTC AAACCTGGTG ATGAGGTTAT TACTGTCGCT GCTGGCTTCC 
CGACTACAGT TAACCCGGCG ATCCAGAATG GTTTAATACC GGTATTCGTG GATGTTGATA 
TCCCGACATA TAATATCGAT GCCTCTCTCA TTGAAGCTGC AGTTACTGAG AAATCAAAAG 
CGATAATGAT CGCTCATACA CTCGGTAATG CATTTAACCT GAGTGAAGTT CGTCGGATTG 
CCGATAAATA TAACTTATGG TTGATTGAAG ACTGCTGTGA TGCCCTTGGG ACGACTTATG 
AAGGCCAGAT GGTAGGTACC TTTGGTGACA TCGGAACCGT TAGTTTTTAT CCGGCTCACC 
ATATCACAAT GGGTGAAGGC GGTGCTGTAT TCACCAAGTC AGGTGAACTG AAGAAAATTA 
TTGAGTCGTT CCGTGACTGG GGCCGGGATT GTTATTGTGC GCCAGGATGC GATAACACCT X1220 
GCGGTAAACG TTTTGGTCAG CAATTGGGAT CACTTCCTCA AGGCTATGAT CACAAATATA 11280 
CTTATTCCCA CCTCGGATAT AATCTCAAAA TCACGGACAT GCAGGCAGCA TGTGGTCTGG 11340 
CTCAGTTGGA GCGCGTAGAA GAGTTTGTAG AGC AG CGTAA AG CTAACTTT TCCTATCTGA 11400 
AACAGGGCTT GCAATCTTGC ACTGAATTCC TCGAATTACC AGAAGCAACA GAGAAATCAG 11460 
ATCCATCCTG GTTTGGCTTC CCTATCACCC TGAAAGAAAC TAGCGGTGTT AACCGTGTCG 11520 
AACTGGTGAA ATTCCTTGAT GAAGCAAAAA TCGGTACACG TTTACTGTTT GCTGGAAATC 11580 
TGATTCGCCA ACCGTATTTT GCTAATGTGA AATATCGTGT AGTGGGTGAG TTGACAAATA 11640 
CCGACCGTAT AATGAATCAA ACGTTCTGGA TTGGTATTTA TCCAGGCTTG ACTACAGAGC 11700 
ATTTAGATTA TGTAGTTAGC AAGTTTGAAG AGTTCTTTGG TTTGAATTTC TAATTCAATT 11760 
TATTCTATCT GGTGATTGCG ATGACCTTTT TGAAAGAATA TGTAATTGTC AGTGGGGCTT 11820 
CCGGCTTTAT TGGTAAGCAT TTACTCGAAG CGCTAAAAAA ATCGGGGATT TCAGTTGTCG 11880 
CAATCACTCG AGATGTAATA AAAAATAATA GTAATGCATT AGCTAATGTT AGATGGTGCA 11940 
GTTGGGATAA TATCGAATTA TTAGTCGAGG AGTTATCAAT TGATTCTGCA TTAATTGGTA 12000 
TCATTCATTT GGCAACAGAA TATGGGCATA AAACATCATC TCTCATAAAT ATTGAAGATG 12060 
CAAATGTTAT AAAACCATTA AAGCTTCTTG ATTTGGCAAT AAAATATCGG GCGGATATCT 
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TTTTAAATAC AGATAGTTTT TTTGCCAAGA AAGATTTTAA TTATCAACAT ATGCGGCCTT 12180 
ATATAATTAC TAAAAGACAC TTTGATGAAA TTGGGCATTA TTATGCTAAT ATGCATGACA 12240 
TTTCATTTGT AAACATGCGA TTAGAGCATG TATATGGGCC TGGGGATGGT GAAAATAAAT 12300 
TTATTCCATA CATTATCGAC TGCTTAAATA AAAAACAGAG TTGCGTGAAA TGTACAACAG 12360 
GCGAACAGAT AAGAGACTTT ATTTTTGTAG ATGATGTGGT AAATGCTTAT TTAACTATAT 12420 
TAGAAAATAG AAAAGAAGTA CCTTCATATA CTGAGTATCA AGTTGGAACT GGTGCTGGGG 12480 
TAAGTTTGAA AG ATTTT CTG GTTTATTTGC AAAATACTAT GATGCCAGGT TCATCGAGTA 12540 
TATTTGAATT TGGTGCGATA GAGCAAAGAG ATAATGAAAT AATGTTCTCT GTAGCAAATA 12600 
ATAAAAATTT AAAAGCAATG GGCTGGAAAC CAAATTTCGA TTATAAAAAA GGAATTGAAG 12660 
AACTACTGAA ACGGTTATGA GATTTTCATG ATCTTTTAAT AAATAAATCG TTAACAAATT 
AGTCGCGTTA TGTTGTAAAA ACTAAGTCGT TTAATTGCAT AGTGAAAGTT CAATTGTTAA 
AAATTCCGAG T CATTT AATT GTTGCAGGTT CATCATGGTT ATC CAAAATA ATAATTGCCG 
GGGTGCAGTT AGCAAGTATT TCATATCTTA TTTCTATGCT AGGTGAAGAG AAATATGCAA 
TCTTTAGTTT GTTAACTGGT TTATTAGTAT GGTGTAGCGC TGTTGATTTT GGCATAGGTA 
CAGGACTGCA AAATTATATA TCAGAATGCA GAGCCAAAAA CAAAAGTTAT GATGCATATA 
TTAAATCAGC ATTACATCTA AGCTTTATAG CTATTATTTT TTTTATTGCT TTATTTTATA 
TTTTTTCTGG GGTAATTTCC GCTAAATATC TTTCTTCTTT TCATGAGGTA TTACAGGACA 
AAACCAGAAT GCTCTTTTTT ACCTCATGTC TGGTTTTCAG TTCTATTGGA ATCGGAGCTA 
TTGCTTATAA AATACTTTTT GCCGAATTGG TCGGGTGGAA AGCTAATCTA TTAAACGCAT 
TATCTTATAT GATAGGTATG CTCGGCTTGC TATATATATA CTATAGGGGG ATCTCAGTTG 
ACATAAAATT ATCACTAATA GTCCTGTATC TTCCAGTGGG TATGATTTCA TTGTGCTATA 
TTGTATATAG ATACATAAAG CTTTATCATG TTAAAACAAC AAAATCTCAT TATATAGCAA 
TTTTACGTAG ATCTTCAGGG TTTTTTCTTT TTACTTTATT ATCGATAGTG GTGCTTCAAA 
CAGATTATAT GGTCATTTCT CAAAGGCTAA CTCCTGCTGA TATTGTTCAA TATACAGTAA 
CGATGAAAAT TTTTGGTTTA GTCTTTTTTA TTTATACTGC TATTTTGCAA GCATTATGGC 
CTATATGTGC TGAATTGAGA GTCAAACAGC AATGGAAAAA ACTTAACAAA ATGATAGGTG 
TCAATATTTT GCTTGGCTCA CTATATGTTG TTGGATGTAC AATATTTATT TATTTATTTA 
AAGAACAGAT ATTTTCAGTA ATAGCCAAAG ATATTAATTA TCAAGTTTCT ATTTTATCTT 13800 
TTATGTTAAT TGGCATATAT TTCTGTATTC GCGTTTGGTG TGACACTTAT GCAATGTTAT 13860 
TGCAAAGTAT GAATTATTTA AAAATACTTT GGATATTAGT ACCACTACAA GCAATAATTG 13920 
GTGGAATAGC ACAATGGTAT TTTTCTAGTA CGCTTGGAAT CAGTGGAGTG CTGCTTGGCT 13980 
TGATTATATC TTTTGCTTTA ACTGTTTTTT GGGGGCTTCC ACTAACTTAC TTAATTAAGG 1404 0 
CAAATAAGGG ATAATCATAT GCTTATATCA TTTTGTATT C CAACTTATAA TAGAAAACAA 14100 
TATCTTGAAG AGTTGTTGAA TAGTATAAAT AATCAGGAAA AATTTAATTT AGATATTGAG 14160 
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ATATGTATAT CAGATAATGC CTCTACTGAT GGTACAGAGG AAATGATTGA TGTTTGGAGG 
AACAATTATA ATTTCCCAAT AATATATCGG CGTAATAGCG TTAACCTTGG GCCAGATAGG 
AATTTTCTTG CTTCAGTATC CCTTGCGAAT GGGGATTATT GTTGGATATT TGGCAGTGAT 
GATGCTCTTG CGAAAGACTC GTTAGCGATA TTACAAACTT ATCTCGATTC TCAAGCAGAT 
ATATATTTAT GTGACAGAAA AGAG AC CGGG TGTGATTTAG TTGAGATTAG AAACCCTCAT 

CGTTCTTGGC TCAGAACAGA TGATGAACTT TATGTGTTTA ATAATAATTT AGATAGGGAA 14520 

ATCTATCT C A GTAGATGCTT ATCTATTGGT GGTGTATTTA GCTATCTAAG TTCTTTAATA 14580 
GTAAAAAAAG AACGATGGGA TGCCATTGAT TTTGATGCGT CCTATATTGG CACTTCCTAT 
CCTCATGTAT TTATCATGAT GAGCGTATTT AATACGCCAG GGTGCCTTTT GCATTATATA 
TCAAAACCAC TCGTAATATG CCGAGGAGAT AATGATAGTT TCGAGAAGAA AGGAAAGGCC 

AGACGAATTT TAATTGATTT TATTGCATAT TTAAAATTAG CTAATGATTT TTACAGTAAA 1482 0 

AATATATCTT T AAAACGAG C ATTTGAAAAT GTTTTGCTAA AAGAGAGACC ATGGTTATAT 14880 

ACAACTTTGG CTATGGCATG TTATGGCAAT AGTGATGAAA AAAGAGATTT AT CTG AATTT 14940 

TATGCAAAGC TAGGTTGTAA TAAAAATATG ATCAACACTG TACTTCGATT TGGGAAACTA 15000 

GCATATGCAG TGAAAAATAT TACCGTGCTT AAGAATTTTA CTAAACGGAT AATTAAGTAG 15060 

TAGTAAGTTA TTATATTGAG ATTAAATGTA GATTTAACCT TTCTGGATTC AGCTAGATTT 15120 

ACGTTACTGA CTTTTCTTTT TAATGAAAAT CATATTTGAT ATATATAAAT AAATTTGGAT 15180 
AGCTTAACTA CTTAGATGTT TTTTTCTGGG AATGTTAGTA TAATAATATA TTTCTTTATG 
ATTGTTTTTG TAGTGTTTTA CTGCCGGTAT TACATTAACT CTATTATTAA GAATTACACC 
TAGTGTAAGC TTCGTAATAT TATTTATCCT TATGATTATT GCTTTAAAGA TGCGTATGGA 

AAAACGGAGA GCTATTCAAT GATCGTAAAC CTATCACGTT TAGGTAAAAG TGGTACGGGA 15420 

ATGTGGCAAT ACTCGATTAA ATTTTTAACG GCACTGCGAG AAATAGCTGA TGTTGACGCA 15480 

ATAATCTGTA GCAAGGTACA CGCTGATTAT TTTGAAAAGC TCGGTTATGC AGTAGTTACT 15540 

GTTCCGAATA TTGTTAGCAA CACATCAAAA ACATCGCGAC TT AG AC C ATT AGTATGGTAT 15600 

GTATATAGTT ACTGGCTTGC GCTGAGGGTT TTAATTAAGT TTGGTAATAA AAAATTGGTG 15660 

TGTACTACAC ATCACACTAT CCCCTTACTG AGAAACCAAA CGATAACCGT ACATGATATA 15720 

AGACCTTTTT ATTATCCAGA TAGTTTTATT CAGAAAGTGT ATTTTCGCTT TTTATTAAAA 15780 

ATGTCCGTTA AGCGATGTAA GCATGTTTTA ACGGTATCTT ATACCGTTAA AGATAGCATT 15840 

GCTAAAACTT ATAATGTAGA TAGTGAGAAA ATATCAGTAA TTTATAATAG TGTTAATAAA 15900 

TCTGATTTTA TACAAAAAAA AGAAAAAGAG AATTACTTTT TAGCTGTTGG TGCAAGTTGG 15960 

CCACATAAAA AT ATTCATT C ATTCATAAAA AATAAAAAAG TTTGGTCTGA CTCTTATAAT 16020 

TTAATTATTG TATGTGGTCG TACTGACTAT GCAATGTCTC TCCAACAAAT GGTCGTTGAT 16080 

CTGGAACTAA AAGATAAAGT GACTTTTTTA CATGAAGTCT CATTTAATGA ATTAAAGATT 16140 

TTATATTCTA AAGCCTACGC GCTTGTTTAT CCATCTATTG ATGAGGGTTT TGGTATACCT 16200 
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C CTATTGAAG CGATGGCATC AAATACTCCA GTTATAGTGT CCGATATACC AGTATTTCAT 
GAAGTGTTAA CCAATGGTGC ATTATATGTG AATCCGGATG ATGAAAAAAG CTGGCAGAGT 
GCAATTAAAA ATATAGAGCA GTTGCCTGAT GCAATTTCCC GATTTAACAA CTATGTCGCA 
CGGTATGACT TTGATAATAT GAAGCAGATG GTTGGCAATT GGTTGGCGGA ATCAAAATAA 
ATGAAAATAA CATTAATTAT TCCCACATAT AATGCAGGGT CGCTTTGGCC TAATGTTCTG 
GATGCGATTA AGCAGCAAAC TATATATCCG GATAAATTGA TTGTTATAGA CTCAGGTTCT 
AAAGATGAAA CGGTTCCGTT AGCCTCAGAC CTGAAAAATA TATCAATATT TAATATTGAC 
TCTAAAGATT TTAATCATGG AGGAACCAGA AATTTAGCAG TTGCAAAAAC TCTGGACGCT 
GATGTTATAA TTTTTCTAAC GCAAGATGCA ATTCTCGCGG ATTCGGATGC AATTAAAAAT 
TTGGTTTATT ATTTTTCAGA TCCATTGATA GCAGCGGTTT GTGGTAGACA ACTTCCTCAT 
AAAGATGCTA ATCCCCTTGC AGTGCATGCC AGAAATTTTA ATTATAGTTC AAAATCTATT 
GTTAAAAGTA AGGCAGATAT AGAAAAATTG GGTATTAAAA CTGTATTTAT GTCCAATTCT 
TTTGCTGCCT ATCGCCGTTC CGTTTTTGAA GAGTTAAGTG GGTTTCCTGA ACATACAATT 
CTTGCCGAGG ATATGTTTAT GGCGGCTAAG ATGATTCAGG CGGGTTATAA GGTCGCCTAC 
TG CG CTG AAG CGGTGGTAAG ACACTCCCAT AATTATACCC CGCGAGAAGA GTTTCAACGA 
TATTTTGATA CTGGTGTATT TCATGCTTGT TCTCCGTGGA TTCAGCGTGA CTTTGGCGGA 
GCCGGTGGTG AGGGTTTCCG CTTCGTAAAA TCAGAGATTC AATTCCTGCT TAAAAATGCA 
CCGTTCTGGA TTCCAAGAGC TTTATTAACA ACCTTTGCTA AATTCTTGGG TTACAAATTA 
GGCAAGCATT GGCAATCTTT ACCGTTGTCT ACATGTCGCT ATTTTAGCAT GTACAAGAGT 
TATTGGAATA ATATCCAATA TTCTTCGTCA AAAGAGATAA AATAAATGTC TTTTCTTCCC 
GTAATTATGG CTGGCGGCAC AGGTAGCCGT TTATGGCCGC TTTCACGCGA ATATCATCCG 
AAGCAGTTTC TAAGCGTTGA AGGTAAACTA TCAATGCTGC AAAATACTAT AAAGCGATTA 
GCTTCACTTT CTACAGAAGA ACCCGTTGTC ATTTGCAATG ACAGACACCG TTTCTTAGTC 17580 
GCTGAACAAC TCCGTGAAAT TGACAAGTTA GCAAATAATA TTATTCTCGA ACCGGTAGGC 17640 
CGTAATACTG CACCAGCGAT CGCTCTTGCC GCGTTTTGTG CGCTCCAGAA TGCTGATAAT 17700 
GCTGATCCTC TTTTGTTGGT TCTTGCTGCA GATCATGTGA TTCAGGATGA AATAGCTTTT 
ACGAAAGCTG TCAGACATGC TGAAGAATAC GCTGCAAATG GTAAGCTTGT AACTTTTGGT 
ATTGTTCCAA CGCATGCTGA AACGGGTTAT GGATATATTC GTCGTGGTGA GTTGATAGGA 
AATGACGCTT ATGCAGTGGC TGAATTTGTG GAGAAACCGG ATATCGATAC CGCCGGTGAC 17940 
TATTTCAAAT CAGGGAAATA TTACTGGAAT AGCGGTATGT TTTTATTTCG TGCAAGCTCT 18000 
TATTTAAACG AATTAAAGTA TTTATCACCT GAAATTTATA AAGCTTGTGA AAAGGCGGTA 18060 
GGACATATAA ATCCCGATCT TGATTTTATT CGTATTGATA AAGAAGAGTT TATGTCATGC 18X20 
CCGAGTGATT CTATCGATTA TGCAGTTATG GAGCACACAC AGCATGCGGT GGTGATACCA 
ATGAGCGCTG GCTGGTCGGA TGTGGGTTCC TGGTCCTCAC TTTGGGATAT ATCGAATAAA 
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GATCATCAGA GAAATGTTTT AAAAGGAGAT ATTTTCGCAC ATGCTTGTAA TGATAATTAC 183 00 

ATTTATTCCG AAGATATGTT TATAAGTGCG ATTGGTGTAA GCAATCTTGT CATTGTTCAA 18360 

ACAACAGACG CTTTACTGGT GGCTAATAAA GATACAGTAC AAGATGTTAA AAAAATTGTC 18420 

GATTATTTAA AACGGAATGA TAGGAACGAA TATAAACAAC ATCAAGAAGT TTTCCGCCCC 18480 

TGGGGAAAAT ATAATGTGAT TGATAGCGGC AAAAATTACC TCGTTCGATG TATCACTGTT 18540 

AAGCCGGGTG AGAAATTTGT GGCGCAGATG CATCACCACC GGGCTGAGCA TTGGATAGTA 18600 

TTATCCGGGA CTGCTCGTGT TACAAAGGGA GAGCAGACTT ATATGGTTTC TGAAAATGAA 18660 

TCAACATTTA TTCCTCCGAA TACTATTCAC GCGCTGGAAA ATCCTGGAAT GACCCCCCTG 18720 

AAGTTAATTG AGATTCAATC AGGTACCTAT CTTGGTGAGG ATGATATTAT TCGTTTAGAA 18780 

CAACGTTCTG GATTTTCGAA GGAGTGGACT AATGAACGTA GTTAATAATA GCCGTGATGT 18840 

TATTTATTCA TCAGGTATTG TGTTTGGAAC GAGTGGGGCT CGCGGTCTTG TAAAAGATTT 18900 

TACACCTCAG GTATGTGCTG CTTTTACGGT TTCATTTGTT GCCGTTATGC AGGAACATTT 18 960 

TTCCTTTGAT ACCGTAGCAT TGGCAATAGA TAATCGTCCA AGTAGTTATG GGATGGCTCA 19020 

GGCGTGTGCT GCTGCATTGG CGGATAAAGG CGTTAACTGT ATTTTTTATG GAGTGGTACC 19080 

AACCCCAGCT TTGGCCTTTC AGTCTATGTC TGACAATATG CCTGCGATAA TGGTTACGGG 19140 

AAGTCATATT CCATTCGAGC GGAACGGCCT CAAGTTTTAT CGTCCTGATG GTGAAATCAC 19200 

GAAACATGAT GAGGCTGCGA TCCTTAGTGT TGAAGATACG TGCAGCCATT TAGAGCTTAA 19260 

AGAACTCATA GTTTCAGAAA TGGCTGCTGT TAATTATATA TCTCGTTATA CATCTTTATT 19320 

TTCTACTCCA TTCCTGAAAA ATAAGCGTAT TGGTATTTAC GAACATTCAA GCGCTGGGCG 19380 

TGATCTTTAT AAGCCTTTAT TTATTGCATT GGGGG CTGAA GTCGTTAGCT TGGGTAGAAG 1944 0 

CGATAATTTT GTACCTATAG ATACAGAGGC TGTAAGCAAA GAGGATCGGG AAAAAGCTCG 19500 

CTCATGGGCT AAAGAGTTCG ATTTAGATGC CATATTCTCG ACAGATGGGG ATGGTGATCG 19560 

CCCTCTTATT GCTGATGAGG CCGGTGAGTG GCTAAGAGGC GATATACTAG GTCTATTATG 19620 

TTCACTTGCA TTGGATGCAG AAGCCGTCGC TATTCCTGTT AGTTGTAACA GCATAATTTC 19680 

TTCTGGCCGC TTTTTTAAAC ATGTTAAGCT TACAAAAATT GGCTCGCCTT ATGTTATCGA 19740 

AGCTTTTAAT GAATTATCGC GGAGTTATAG TCGTATTGTC GGTTTTGAAG CCAATGGCGG 19800 

TTTTTTATTA GGAAGCGACA TCTGTATTAA CGAGCAGAAT CTTCATGCCT TACCAACTCG 19860 

TGATGCTGTA TTACCAGCAA TAATGCTGCT TTACAAAAGT AGGAATACCA GCATTAGCGC 19920 

TTTAGTCAAT GAACTCCCAA CTCGTTACAC CCATTCTGAC AGATTACAGG GGATTACAAC 19980 

TGATAAAAGT CAATCCTTAA TTAGTATGGG CAGAGAAAAT CTGAGCAACC TCTTAAGCTA 20040 

TATTGGTTTG GAGAATGAAG GTGCAATTTC TACAGATATG ACAGATGGTA TGCGAATTAC 20100 

TTTACGTGAT GGATGTATTG TGCATTTGCG CGCTTCTGGT AATGCACCTG AGTTACGCTG 20160 

CTATGCAGAA GCTAATTTAT TAAATAGGGC TCAGGATCTT GTAAATACAA CGCTTGCTAA 20220 

TATTAAAAAA CGATGCTTGC TGTAAAAAAA TTGAATGTTA TTTACTTAAT ATGCCTATTT 20280 
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TATTTACATT ATGCACGGTC AGAGGGTGAG GATTAAATGG ATAATATTGA TAATAAGTAT 
AATCCACAGC TATGTAAAAT TTTTTTGGCT ATATCGGATT TGATTTTTTT TAATTTAGCC 
TTATGGTTTT CATTAGGATG TGTCTATTTT ATTTTTGATC AAGTACAGCG ATTTATTCCT 
CAAGACCAAT TAGATACAAG AGTTATTACG CATTTTATTT TGTCAGTAGT ATGTGTCGGT 
TGGTTTTGGA TTCGTTTGCG ACATTATACT ATCCGCAAGC CATTTTGGTA TGAGTTAAAA 
GAAATTTTTC GTACG AT CGT TATTTTTGCT ATATTTGATT TGGCTCTGAT AGCGTTTACA 
AAATGGCAGT TTTCACGCTA TGTCTGGGTG TTTTGTTGGA CTTTTGCCCT AATCCTGGTG 
CCTTTTTTTC GCGCACTTAC AAAGCATTTA TTGAACAAGC TAGGTATCTG GAAGAAAAAA 
ACTATCATCC TGGGG AG CGG ACAGAATGCT CGTGGTGCAT ATTCTGCGCT GCAAAGTGAG 
GAGATGATGG GGTTTGATGT TATCGCTTTT TTTGATACGG ATGCGTCAGA TGCTGAAATA 
AATATGTTGC CGGTGATAAA GGATACTGAG ATTATTTGGG ATTTAAATCG TACAGGTGAT 
GTCCATTATA TCCTTGCTTA TGAATACACC GAGTTGGAGA AAACACATTT TTGGCTACGT 
GAACTTTCAA AACATCATTG TCGTTCTGTT ACTGTAGTCC CCTCGTTTAG AGGATTGCCA 
TTATATAATA CTGATATGTC TTTTATCTTT AGCCATGAAG TTATGTTATT AAGGATACAA 
AATAACTTGG CTAAAAGGTC GTCCCGTTTT CTCAAACGGA CATTTGATAT TGTTTGTTCA 
ATAATGATTC TTATAATTGC ATCACCACTT ATGATTTATC TGTGGTATAA AGTTACTCGA 
GATGGTGGTC CGGCTATTTA TGGTCACCAG CGAGTAGGTC GGCATGGAAA ACTTTTTCCA 
TGCTACAAAT TTCGTTCTAT GGTTATGAAT TCTCAAGAGG TACTAAAAGA ACTTTTGGCT 
AACGATCCTA TTGCCAGGGC TGAATGGGAG AAAGATTTTA AACTGAAAAA TGATCCTCGA 
ATCACAGCTG TAGGTCGATT TATACGTAAA ACTAGCCTTG ATGAGTTGCC ACAACTTTTT 
AATGTACTAA AAGGTGATAT GAGCCTGGTT GGACCACGAC CTATCGTTTC GGATGAACTG 
GAGCGTTATT GTGATGATGT TGATTATTAT TTGATGGCAA AGCCGGGCAT GACAGGTCTA 
TGGCAAGTGA GTGGGCGTAA TGATGTTGAT TATGACACTC GTGTTTATTT TGATTCCTGG 
TATGTTAAAA ACTGGACGCT TTGGAATGAT ATTGCCATTC TGTTTAAAAC AGCGAAAGTT 
GTTTTGCGGC GAGATGGTGC GTATTAAGCT TACCGAGAAG TACTGAATAA TAATTGTATA 
AATTAGCCTG CGTAAAATCT GAACGCATCA ATCGCTACCT TAATATCATA CCTTTGAGTT 
AACATACTAT TCACCTTTAA CCTGCCATGA CCGTTTGTGG CAGGGTTTCC ACACCTGACA 
GGAGTATGTA ATGTCCAAGC AACAGATCGG CGTCGTCGGT ATGGCAGTGA TGGGGCGCAA 
CCTCGCGCTC AACATCGAAA GCCGTGGTTA TACCGTCTCC GTTTTCAACC GCTCCCGTGA 22020 
AAAGACCGAA GAAGTGATTG C CGAGAATCC CGGCAAAAAG CTGGTGCCTT ATTACACGGT 22080 
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THE CLAIMS: 

1. - A nucleic acid molecule derived from: a gene 
encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
5 oligosaccharide unit, including a wzx gene or a- wzy gene, 
or a gene with a similar function; the gene being involved 
in the synthesis of a particular bacterial polysaccharide 
antigen, wherein the sequence of the nucleic acid molecule 
is specific to the particular bacterial polysaccharide 
10 antigen. 

2. A nucleic acid molecule derived from: a gene 

encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
15 oligosaccharide unit such as a wzx or wzy gene; the gene 
being involved in the synthesis of a particular bacterial 
O antigen, wherein the sequence of the nucleic acid 
molecule is specific to the particular bacterial O 
antigen . 

20 

3. A nucleic acid molecule derived from: a gene 

encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
oligosaccharide unit such as a wzx or wzy gene; the gene 
25 being involved in the synthesis of an O antigen expressed 
by E . coli, wherein the sequence of the nucleic acid 
molecule is specific to the O antigen. 

4. A nucleic acid molecule derived from a gene 

3 0 encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
oligosaccharide unit such as a wzx or wzy gene; the gene 
being involved in the synthesis of an 0 antigen expressed 
by S. enterica, wherein the sequence of the nucleic acid 
35 molecule is specific to the O antigen. 

5. A nucleic acid molecule according to any one 

of claims 1 to 4 wherein the nucleic acid molecule is 



_ PCT/AU98/00315 
WO 98/50531 

95 

approximately 10 to 2 0 nucleotides in length. 

6. A nucleic acid molecule derived from a gene, 

the gene being selected from a group consisting of the 
5 following sequences: 

nucleotide position 739 to 1932 of SEQ ID NO:l; 
nucleotide position 8646 to 9911 of SEQ ID NO : 1 ; 
nucleotide position 9901 to 10953 of SEQ ID NO:l; 
nucleotide position 11821 to 12945 of SEQ ID NO:l; 
10 nucleotide position 79 to 861 of SEQ ID NO: 2; 

nucleotide position 858 to 2042 of SEQ ID NO: 2; 
nucleotide position 2011 to 2757 of SEQ ID NO:2; 
nucleotide position 2744 to 4135 of SEQ ID NO: 2; 
nucleotide position 5257 to 6471 of SEQ ID NO: 2; and 
15 nucleotide position 13156 to 13821 of SEQ ID NO : 2 ; 

which nucleic acid molecule is capable of hybridizing to 
complementary sequence from said gene. 



20 



25 



30 



7 . A nucleic acid molecule which is any one of 

the oligonucleotides in Table 5 or 5A, with respect to the 
genes wbdH, wzx, wzy and wbdM. 

8. A nucleic acid molecule which is any one of 

the oligonucleotides in Table 6 or 6A. 



9. A nucleic acid molecule derived from a gene, 

the gene being selected from a group consisting of the 
following sequences: 

nucleotide position 1019 to 2359 of SEQ ID NO: 3; 
nucleotide position 2352 to 3314 of SEQ ID NO: 3; 
nucleotide position 3361 to 3875 of SEQ ID NO: 3; 
nucleotide position 3977 to 5020 of SEQ ID NO: 3; 
nucleotide position 5114 to 6313 of SEQ ID NO: 3; 
nucleotide position 6313 to 7323 of SEQ ID NO: 3; 
35 nucleotide position 7310 to 8467 of SEQ ID NO: 3; 

nucleotide position 12762 to 14054 of SEQ ID NO:4; and 
nucleotide position 14059 to 15060 of SEQ ID NO:4; 
which nucleic acid molecule is capable of hybridizing to 



5 



10 
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complementary sequences from said gene. 

10. A nucleic acid molecule which is any one of 
the oligonucleotides in Table 7. 

11. A nucleic acid molecule which is any one of 
the oligonucleotides in Table 8 with respect to the genes 
wzx and wbaV. 

12 . A method of testing a sample for the presence 
of one or more bacterial polysaccharide antigens, the 
method comprising the following steps: 
(a) contacting the sample with at least one 
oligonucleotide molecule capable of specifically 
15 hybridising to: (i) a gene encoding a transferase, or (ii) 
a gene encoding an enzyme for transport or processing of 
oligosaccharide or polysaccharide units, including a wzx 
or wzy gene; wherein said gene is involved in the 
synthesis of the bacterial polysaccharide antigen; under 
conditions suitable to permit the at least one 
oligonucleotide molecule to specifically hybridise to at 
least one such gene of any bacteria expressing the 
bacterial polysaccharide antigen present in the sample and 
(b) detecting any specifically hybridised oligonucleotide 
25 molecules. 

13. The method according to claim 12, the method 
further comprising contacting the sample with a further at 
least one oligonucleotide molecule capable of specifically 
hybridising to at least one sugar pathway gene under 
conditions suitable to permit the further at least one 
oligonucleotide molecule to specifically hybridise to at 
least one such sugar pathway gene of any bacteria 
expressing the bacterial polysaccharide antigen present in 
35 the sample and detecting any specifically hybridised 
oligonucleotide molecules . 
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30 
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A method of testing a sample for the presence 
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of one or more bacterial polysaccharide antigens, the 
method comprising the following steps: 
(a) contacting the sample with at least one pair of 
oligonucleotide molecules, with at least one 

5 oligonucleotide molecule of the pair capable of. 

specifically hybridising to: (i) a gene encoding a 
transferase, or (ii) a gene encoding an enzyme for 
transport or processing of oligosaccharide or 
polysaccharide units, including a wzx or wzy gene; wherein 

10 the gene is involved in the synthesis of the bacterial 
polysaccharide antigen; under conditions suitable to 
permit the at least one oligonucleotide molecule of the 
pair of molecules to specifically hybridise to at least 
such gene of any bacteria expressing the bacterial 

15 polysaccharide antigen present in the sample and 

(b) detecting any specifically hybridised oligonucleotide 
molecules . 

15. The method according to claim 14, the method 
20 further comprising contacting the sample with a further at 
least one pair of oligonucleotide molecules, with at least 
one oligonucleotide molecule of the pair capable of 
specifically hybridising to at least one sugar pathway 
gene under conditions suitable to permit the further at 
25 least one oligonucleotide molecule of the pair to 

specifically hybridise to at least one such sugar pathway 
gene of any bacteria expressing the bacterial 
polysaccharide antigen present in the sample and detecting 
any specifically hybridised oligonucleotide molecules. 

30 

16. A method of testing a sample for the presence 
of one or more bacterial O antigens, the method comprising 
the following steps: 

(a) contacting the sample with at least one 
3 5 oligonucleotide molecule capable of specifically 
hybridising to: (i) a gene encoding an O antigen 
transferase, or (ii) a gene encoding an enzyme for 
transport or processing of the oligosaccharide or 
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polysaccharide units, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the bacterial 0 
antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
5 at least one such gene of any bacteria expressing the 
bacterial O antigen present in the sample and 
(b) detecting any specifically hybridised oligonucleotide 
molecules . 



10 



17. The method according to claim 16, the method 
further comprising contacting the sample with a further at 
least one oligonucleotide molecule capable of specifically 
hybridising to at least one sugar pathway gene under 
conditions suitable to permit the further at least one 
15 oligonucleotide molecule to specifically hybridise to at 
least one such sugar pathway gene of any bacteria 
expressing the bacterial O antigen present in the sample 
and detecting any specifically hybridised oligonucleotide 
molecules . 

20 

18. * The method according to claim 16 or 17 wherein 
the 0 antigen is expressed by £L_ coli or £L_ enterica . 

19. The method according to claim 18 wherein the 
25 coli express the 0157 O antigen serotype or the 0111 O 

antigen serotype. 

20. The method according to claim 18 wherein the 
S . enterica express the C2 or B O antigen serotype. 



30 



21. The method according to any one of claims 16 
to 20 wherein the specifically hybridised oligonucleotide 
molecules are detected by Southern blot analysis. 



35 



22. A method of testing a sample for the presence 
of one or more bacterial O antigens, the method comprising 
the following steps: 
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(a) contacting the sample with at least one pair of 
oligonucleotide molecules, with at least one 
oligonucleotide molecule of the pair being capable of 
specifically hybridising to: (i) a gene encoding an O 
5 antigen transferase, or (ii) a gene encoding an enzyme for 
transport or processing of oligosaccharide or 
polysaccharide units, including a wzx or wzy gene; wherein 
the gene is involved in the synthesis of the bacterial O 
antigen; under conditions suitable to permit the at least 
10 one oligonucleotide molecule of the pair of molecules to 
specifically hybridise to at least one such gene of any 
bacteria expressing the bacterial O antigen present in the 
sample and 

(b) detecting any specifically hybridised oligonucleotide 
15 molecules. 

23. The method according to claim 22, the method 
further comprising contacting the sample with a further at 
least one pair of oligonucleotide molecules, with at least 
2 0 one oligonucleotide molecule of the pair capable of 

specifically hybridising to at least one sugar pathway 
gene under conditions suitable to permit the further at 
least one oligonucleotide molecule of the pair to 
specifically hybridise to at least one such sugar pathway 

2 5 gene of any bacteria expressing the bacterial O antigen 

present in the sample and detecting any specifically 
hybridised oligonucleotide molecules. 

24. The method according to claim 22 or 23 wherein 

3 0 the O antigen is expressed by EL. coli or enterica. 

25. The method according to claim 24 wherein the 
E. coli are 0111 or the 0157 O antigen serotype. 



35 26. The method according to claim 24 wherein the 

S . enterica express the C2 or B O antigen serotype. 
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27. The method according to any one of claims 22 
to 26 wherein the method is performed according to the 
polymerase chain reaction method. 

5 28. The method according to any one of claims 22 

to 26 wherein the oligonucleotide molecules are selected 
from the group of nucleic acid molecules according to any 
one of claims 5 to 11. 

10 29. A method for testing a food derived sample for 

the presence of one or more particular bacterial O 
antigens, the method being according to any one of claims 
16 to 28. 

15 30. A method for testing a faecal derived sample for 

the presence of one or more particular bacterial O 
antigens, the method being according to any one of claims 
16 to 28. 
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31. A method for testing a sample derived from a 
patient for the presence of one or more particular 
bacterial O antigens, the method being according to any 
one of claims 16 to 28. 

32. A kit comprising a first vial containing a first 
nucleic acid molecule capable of specifically hybridising 
to: (i) a gene encoding a transferase, or (ii) a gene 
encoding an enzyme for transport or processing 
oligosaccharide or polysaccharide units, including a wzx 
or wzy gene, wherein said gene is involved in the 
synthesis of a bacterial polysaccharide. 



33. The kit according to claim 32 further comprising 
in the first vial, or in a second vial, a second nucleic 
3 5 acid molecule capable of specifically hybridising to: (i) 
a gene encoding a transferase, or (ii) a gene encoding an 
enzyme for transport or processing oligosaccharide or 
polysaccharide units, including a wzx or wzy gene, wherein 
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said gene is involved in the synthesis of a bacterial 
polysaccharide, and wherein the sequence of the second 
nucleic acid molecule is different from the sequence of 
the first nucleic acid molecule. 
5 34. The kit according to claim 33 further' comprising 

a nucleic acid molecule derived from a sugar pathway gene. 

35. A kit according to claim 32 further comprising 
in the first vial, or in a second vial, a second nucleic 

10 acid molecule capable of specifically hybridising to a 
sugar pathway gene. 

36. A kit according to any one of claims 32 to 3 5 
wherein the nucleic acid molecules are approximately 10 to 

15 20 nucleotides in length. 

37. A kit comprising a first vial containing a 
first nucleic acid molecule capable of specifically 
hybridising to: (i) a gene encoding a transferase, or (ii) 

20 a gene encoding an enzyme for transport or processing 

oligosaccharide or polysaccharide units, including a wzx 
or wzy gene, wherein said gene is involved in the 
synthesis of a bacterial O antigen. 

25 38. The kit according to claim 37, further 

comprising in the first vial, or in a second vial, a 
second nucleic acid molecule capable of specifically 
hybridising to: (i) a gene encoding a transferase, or (ii) 
a gene encoding an enzyme for transport or processing 
3 0 oligosaccharide or polysaccharide units, including a wzx 
or wzy gene, wherein said gene is involved in the 
synthesis of a bacterial 0 antigen, and wherein the 
sequence of the second nucleic acid molecule is different 
from the sequence of the first nucleic acid molecule. 



35 



39. A kit according to claim 37 further comprising 
in the first vial, or in a second vial, a second nucleic 
acid molecule capable of specifically hybridising to a 
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sugar pathway gene . 

40. The kit according to claim 38 further comprising 
a nucleic acid molecule derived from a sugar pathway gene. 

5 

41. The kit according to any one of claims 37 to 40 
wherein the nucleic acid molecules are approximately 10 to 
20 nucleotides in length. 

10 42. The kit according to any one of claims 31 to 34 

wherein the first and second nucleic acid molecules are 
according to any one of claims 5 to 11. 
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GATCTGATGGCCGTAGGGCGCTACGTGCTTTCTGCTGATATCTGGGCTGAGTTGGAAAAA 
ACTGCTCCAGGTGCCTGGGGACGTATTCAACTGACTGATGCTATTGCAGAGTTGGCTAAA 
AAACAGTCTGTTGATGCCATGCTGATGACCGGCGACAGCTACGACTGCGGTAAGAAGATG 
GGCTATATGCAGGCATTCGTTAAGTATGGGCTGCGCAACCTTAAAGAAGGGGCGAAGTTC 
CGTAAGAGCATCAAGAAGCTACTGAGTGAGTAGAGATTTACACGTCTTTGTGACGATAAG 
CCAGAAAAAATAGCGGCAGTTAACATCCAGGCTTCTATGCTTTAAGCAATGGAATGTTAC 
TGCCGTTTTTTATGAAAAATGACCAATAATAACAAGTTAACCTACCAAGTTTAATCTGCT 
TTTTGTTGGATTTTTTCTTGTTTCTGGTCGCATTTGGTAAGACAATTAGCGTGAGTTTTA 
GAGAGTTTTGCGGGATCTCGCGGAACTGCTCACATCTTTGGCATTTAGTTAGTGCACTGG 
TAGCTGTTAAGCCAGGGGCGGTAGCTTGCCTAATTAATTTTTAACGTATACATTTATTCT 

TGCCGCTTATAGCAAATAAAGTCAATCGGATTAAACTTCTTTTCCATTAGGTAAAAGAGT 

GTTTGTAGTCGCTCAGGGAAATTGGTTTTGGTAGTAGTACTTTTCAAATTATCCATTTTC 



LECDMKKIVI IGNVASMMLR 
TTAGAATGTGATATGAAAAAAATAGTGATCATAGGCAATGTAGCGTCAATGATGTTAAGG 

FRKELIMNLVRQGDNVYCLA 
TTCAGGAAAGAATTAATCATGAATTTAGTGAGGCAAGGTGATAATGTATATTGTCTAGCA 

NDFSTEDLKVLS SWGVKGVK 
AATGATTTTTCCACTGAAGATCTTAAAGTACTTTCGTCATGGGGCGTTAAGGGGGTTAAA 

FSLNSKGINPFKDI IAVYEL 
TTCTCTCTTAACTCAAAGGGTATTAATCCTTTTAAGGATATAATTGCTGTTTATGAACTA 

KKILKDISPDIVFSYFVKPV 
AAAAAAATTCTTAAGGATATTTCCCCAGATATTGTATTTTCATATTTTGTAAAGCCAGTA 

IFGTIASKLSKVPRIVGMIE 
ATATTTGGAACTATTGCTTCAAAGTTGTCAAAAGTGCCAAGGATTGTTGGAATGATTGAA 

GLGNAFTYYKGKQTTKTKMI 
GGTCTAGGTAATGCCTTCACTTATTATAAGGGAAAGCAGACCACAAAAACTAAAATGATA 

KWIQILLYKLALPMLDDLIL 
AAGTGGATACAAATTCTTTTATATAAGTTAGCATTACCGATGCTTGATGATTTGATTCTA 

LNHDDKKDLIDQYNIKAKVT 
TTAAATCATGATGATAAAAAAGATTTAATCGATCAGTATAATATTAAAGCTAAGGTAACA 

VLGGIGLDLNEFSYKEP PKE 
GTGTTAGGTGGGATTGGATTGGATCTTAATGAGTTTTCATATAAAGAGCCACCGAAAGAG 

KITFIFIARL LREKGIFEFI 
AAAATTACCTTTATTTTTATAGCAAGGTTATTAAGAGAGAAAGGGATATTTGAGTTTATT 

EAAKFVKTTYPS SEFVI LGG 
GAAGCCGCAAAGTTCGTTAAGACAACTTATCCAAGTTCTGAATTTGTAATTTTAGGAGGT 
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MLLCCIHINVYYLL 
CGATTTAGATGGCAGTTG ATGTTACTATGCTGCATACATATCAATGTATATTATTTACTT 7 80 



840 
900 
960 ' 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 



Figure 7/1 



PCT/AU98/00315 
WO 98/50531 

8/58 



f'esn'npfslqkneieslrke 

TTTGAGAGTAATAATCCTTTCTCATTACAAAAAAATGAAATTGAATCGCTAAGAAAAGAA 

HDLIY pGHVENVQDWLEKS S 
CATGATCTTATTTATCCTGGTCATGTGGAAAATGTTCAAGATTGGTTAGAGAAAAGTTCT 

VFVLPTSYREGVPRVIQE A M 
GTTTTTGTTTTACCTACATCATATCGAGAAGGCGTACCAAGGGTGATCCAAGAAGCTATG 

AIGRPVITTNVPGCRDI IND 
GCTATTGGTAGACCTGTAATAACAACTAATGTACCTGGGTGTAGGGATATAATAAATGAT 

GVNGFLIPPFEINLLAEKMK 
GGGGTCAATGGCTTTTTGATACCTCCATTTGAAATTAATTTACTGGCAGAAAAAATGAAA 

YFIENKDKVLEMGLAGRKFA 
TATTTTATTGAGAATAAAGATAAAGTACTCGAAATGGGGCTTGCTGGAAGGAAGTTTGCA 

EKNFDAFEKN NRLAS I IKSN 
GAAAAAAACTTTGATGCTTTTGAAAAAAATAATAGACTAGCATCAATAATAAAATCAAAT 

End of orfl 

N D F * 



AATGATTTT TGAC 



TTGAGCAGAAATTATTTATATTTCAATCTGAAAAATAAAGGCTGTTA 
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Start of orf2 , 

MNKVALITGITGQDGSYLA 

TTATGAATAAAGTGGCATTAATTACTGGTATCACTGGGCAAGATGGCTCCTATTTGGCAG 

ELLLEKGYEVHGIKRRASSF 
AATTATTGTTAGAAAAAGGTTATGAAGTTCATGGTATTAAACGCCGTGCATCTTCATTTA 

NTERVDHIYQDSHLANPKLF 
ATACTGAGCGAGTGGATCACATCTATCAGGATTCACATTTAGCTAATCCTAAACTTTTTC 

LHYGDLTDTSNLTRILKEVQ 
TACACTATGGCGATTTGACAGATACTTCCAATCTGACCCGTATTTTAAAAGAAGTTCAAC 

PDEVYNLGAMSHVAVSFESP 
CAGATGAAGTTTACAATTTGGGGGCGATGAGCCATGTAGCGGTATCATTTGAGTCACCAG 

EYTADVDAIGTLRL LEAIRI 
AATACACTGCTGATGTTGATGCGATAGGAAC ATTGCGTCTTCTTGAAGCTATCAGGATAT 2340 

LGLEKKTKFYQASTSELYGL 
TGGGGCTGGAAAAAAAGACAAAATTTTATCAGGCTTCAACTTCAGAGCTTTATGGTTTGG 

VOEIPQKETTPFYPRSPYAV 
TTCAAGAAATTCCACAAAAAGAGACTACGCCATTTTATCCACGTTCGCCTTATGCTGTTG 

AKLYAYWITVNYRESYGMFA 
CAAAATTATATGCCTATTGGATCACTGTTAATTATCGTGAGTCTTATGGTATGTTTGCCT 

CNGI LFNHES PR RGETFVTR 
GCAATGGTATTCTCTTTAACCACGAATCACCTCGCCGTGGCGAGACCTTTGTTACTCGTA 

KITRG IANIAQGLDKCLYLG 
AAATAACACGCGGGATAGCAAATATTGCTCAAGGTCTTGATAAATGCTTATACTTGGGAA 

NMDSLRDWGHAKDYVKMQWM 
ATATGGATTCTCTGCGTGATTGGGGACATGCTAAGGATTATGTCAAAATGCAATGGATGA 
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MLQQETPEDFVIATGIQYSV 
TGCTGCAGCAAGAAACTCCAGAAGATTTTGTAATTGCTACAGGAATTCAATATTCTGTCC 

REFVTMAAEQVG I ELAFEGE 
GTGAGTTTGTCACAATGGCGGCAGAGCAAGTAGGCATAGAGTTAGCATTTGAAGGTGAGG 

GVNEKGVVVSVNGTDAKAVN. 
GAGTAAATGAAAAAGGTGTTGTTGTTTCGGTCAATGGCACTGATGCTAAAGCTGTAAACC 

PGDVI ISVDPRYFRPAEVET 
CGGGCGATGTAATTATATCTGTAGATCCAAGGTATTTTAGGCCTGCAGAAGTTGAAACCT 

LLGDPTNAHKKLGWSPEITL 
TGCTTGGCGATCCTACTAATGCGCATAAAAAATTAGGATGGAGCCCTGAAATTACATTGC 

REMVKEMVS SDLAIAKKNVL 
GTGAAATGGTAAAAGAAATG GTTTCCAGCGATTTAGCAATAGCGAiWiAGAACOTCTTGC 
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End of orf 2 

LKANNIATNI PQE* 

TGAAAGCTAATAACATTGCCACTAATATTCCGCAAGAArA^AAAGATAATACATTA/AT 312 0 



Start of orf3 

M F 

AATTAAAAATGGTGCTAGATTTATTAGTACCATTATTTTTTTTTGGGTGACTAMIGT'^ ^A 



ITSDKFREI IKLVPLVSIDL 
TTACATCAGATAAATTTA G AGAAATTATCAAGTTAGTTCCATTAGTATCAATTGATCTGC 

LI ENENGEYL FGLRN NRPAK 
TAATTGAAAACGAGAATGGTGAATATTTATTTGGTCTTAGGAATAJiTCGACCGGCCAAA/ ' i 

NYFFVPGGRIRKNESIKNAF 
ATTATTTTTTTGTTCCAGGTGGTAGGATTCGCAAAA/iT G AATCTATTA^iAJiATGCTTTTA r 

KRIS SMELGKEYGISGSVFN 
AAAGAATATCATGTATGGAATTAOGT/iAAGAGTATGGTATTTCAGGAAGTOTTTTTAATG 

GVWEHFYDDGFFSEGEATHY 
GTGTATGGGAACATTTCTATGATGATGGTTTTTTTTCTGAAGGCGAGGCAACACATTATA 

IVLCYTLKVLKSELNLPD DQ 
TAGTGCTTTGTTACACACTGAAAGTTCTTAAAAGTGAATTGAATCTCCCAGATG/iTC/i/iC 

HREYLWLTKHQ INAKQDVHN 
ATCGTGAATACCTTTGGCTAAGTAAAGACCATVATAAATGCTAAACA^iGATGTTCATAAGT 
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End of orf 3 Start of orf4 

YSKNYFL* M 
ATTCAAAAAATTATTTTTTGrAATTTTTATTAAAAATTAATATGCGAGAGAATTGTA gCT 3 6 60 



SQCLYPVI IAGGTGSRLWPL 
GTCAATGTCTTTACCCTGTAATTATTGCCGGAGGAACCGGAAGCCGTCTATGGCCGTTGT 3720 



SRVLYPKQFLNLVGDSTMLQ 
CTCGAGTATTATACCCTAAACAATTTTTAAATTTAGTTGGGGATTCTACAATGTTGCAJiJi 

TTITRLDGI EC ENPIVICNE 
GAACAATTACGCGTTTGGATGGCATCGAATGCGAAAATCCJATTGTTATCTGCAATG^iG 



3780 
3840 



DHRFIVAEQLRQ IGKLTKNI 
ATCACGGATTTATTGTAGCAGAQCAATTACGACAGATTGGTAAGGTAACCAAG^iATATTA 39 00 



ILEPKGRNTAPAIALAAFIA 
TACTTQAGCCGAAAGGCCGTAATAGTGCACCTGGGATAGGTTTAQGTGGTTTTATCGGTC 
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QKNNPNDDP 
AG AAG AAT AAT C C T AATG AC G AC C CTTT A ' 



V 



TTATTA G T ACTTG CGGCAGAC CACTCTATAA 



K 



V 



N 



K 
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AAATACTGG T T ATG G AT AT ATTAAG A 
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' CTTGATGAACTACCGAAATTTAGACCAG - 
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ATATTTATCATAGCTGTGAATGTGCAA l 



, C C G C T AC AG C AAATAH . J A tj ATATG G AC TTT C T C C 
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N 



V M 



GAATTAACGAGGCTGAGTTTATTAATTQ 

EKTKDAVVL 
AAAAAACAAAAGACGCTGTAGTTCTTCCGA ' 



TCCTGAAGAGTCTATCGATTAT G CTGT G ATCG 



DIGWNDVGS 
^A G ATATTGGCTCGAA 'i'G ACCTGGGTT C TT 



WDISQKDCHG 



N V C H 



D 



V L N H 



N 



V 



TGCTCAATCATOAT G GAGAAAATAGTTTTATTO 



ACTCTGAGTCAAGTCTGGTTGCGACAG 



Y M H R 



V 



W 



K 



ACTACATGCATCGTGCAGTTTTTCGCCCTTGGGG 1 



r TAAATT CG AT G C AAT AG AC C AAG G CO - 



DRYRVKKI I 
ATAG ATAT AG AGT AAAAAAAATAAT AG TT 



K 



R M 



' AAACC AG GAG AAG G QTTAG ATTT AAG C AT G C 



K V 



S N 



Q G 



K 



v 



GTCTTGAQAATCCAGGCGTAATACCTTTGCATCTAATTGAAC 1 



■ TAAGTTCTGGTGATTACC 
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V R 



N 



K 



TTGAATCAGATGATATAGTGCGTTTTACTGA * 



g AG AT AT AAC AGT AAAC AATTC C T AAAG C - 
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TTATTACAAGGGAAACTTGAGCCAACAAATGAACCTTATGCTATCGCAAAAATTGCAGGT 

IKLCESYNRQFGRDYRSVMP 
ATTAAATTATGTGAATCTTATAACCGTCAGTTTGGGCGTGATTACCGTTCAGTAATGCCA 

TNLYGPNDNFHPSNSHVIPA 
ACCAATCTTTATGGTCCAAATGACAATTTTCATCCAAGTAATTCTCATGTGATTCCGGCG 

LLRRFHDAVEN NSPNVVVWG 
CTTTTGCGCCGCTTTCATGATGCTGTGGAAAACAATTCTCCGAATGTTGTTGTTTGGGGA 



VMEMPYDIWQ KNTKVMLSH I 
GTCATGGAGATGCCATACGATATATGGCAAAAAAATACTAAAGTAATGTTGTCTCATATC 



VGYKGHITFDT TKPDGAPRK 
GTAGGTTATAAAGGGCATATTACGTTCGATACAACAAAGCCCGATGGAGCCCCTCGAAAA 



7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 



SGTPKREFLHVDDMASASIY 
AGTGGTACTCC AAAGCGTGAATTCTTAC ATGTAGATGATATGGCTTCTGC AAGC ATTTAT 8280 



8340 



NIGTGIDCTICELAETIAKV 
AATATTGGAAC AGGTATTGACTGCACGATTTGTG AGCTTGCGGAAAC AATAGC AAAAGTT 8400 



8460 



LLDVTLLHQLGWNHKITLHK 
CTACTTGATGTAACGCTTCTTCATCAACTAGGTTGGAATCATAAAATTACCCTTCACAAG 852 0 
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End of orf8 



8940 
9000 



ggtcttgaaaataca^ 8580 

Start of orf9 rr, TV RSTPblSl. 

^TTTTTACATTCCC^ 8640 

a^ga^ggaaaa^ 8700 

cacagggctattgg^^ 8760 

cct?tgLcLttgacag^ 8820 

TTTATGGTATCTGGCAGCACTTCTACGAAGACAATAGTATGGG^ ^80 
LTLAkG?TA^GCAT?CCTTC^TALTiACAAC?AA^CAiTTTCALTTACCGAAGT 

CACA^CATAATGCTTATTGCTGGCTATCGCGAGCAAAGCTGATAAA 

5 OT mj5ttStcS^^ 9060 

8t T f 1° P ^ I" ^ V V M A G G T G S 

Lgata^mStctga^ 9120 

tcgtctttLccacttt^ 9180 

TAACACCTTGTTACA^^^ 9240 

ag^aa^tgLcLcatcgct^ 9300 

ATTAJ^TGGTAATATTATTCTAGAACCATGCGGGCGAAATACTGCACCAGC 9360 
ATCTGCGTTTCATGCGTTAAA^CGTAATCCTCAGGAAGATCCATTGC 

catcg^taatcaagStaaaa^^ 9540 

ttItgIgtotaJw^^ 9600 

tttttattatg^t^ 9660 

GACTTCTGG^AATcLtATTG^ 9720 

Figure 8/8 
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tgaggaattgagaaa^tttagacct 



s y i d L 



D F 1 R L S aaLgJacJatJtc?agattgtc?tgc 



^CAGCATGTAAAAAAAATAGTCGAAAT 



CTCATACATTGATCTAGATTTTATTCGATTATC. 

tgLtcta^gatt^tgctg^^ 

aacaggagatgtatgtaa^ggtgatatattaacctatgatac^ 
ctctgLt^gcgt^ag^ 

DA VLVSKKSDVQHVKKI V E M 

agatgccgttcttgtgtctaaaaagagtg ATGTAC * r?r * 

cc^taLttgcagcL^ 
aaLtttcJttcga^tgacc^^ 

tggtgaggggctttctttaaggatgcatcaccatcgttctgaaca^ 

G T A K V T ^__9_ 1 ° T ^ AA ^ TAAACTAGTC 

atacattccccttggcgcagcgtatagtcttgLa^tc^gggcataatccctcttaatct 

End of orflO Start of o^* 1 ?; 

ttacaaacatgaagattaacatatcaaatctttaacctgctttaaagcctatgatattcg 



TGGTACAGCAAAAGTAACCCTTGGCGATAAAACTAAACTAGTCACCGCAAATGAATCGAT 

yiplgaayslenp g i ,i„p_l_n_l 



K 



N 



W R 



CGGGAAATTAGGCGAAGAACTGAA' 



EFLKPKTIV 
CGAATTTCTCAAACCGAAAACCATTGTTTTA* 



'tgaagatattgcctggcgcattgggcgtgcctatgg 



9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 

10560 
10620 



V 



.ggcggtgatgtccgcctcaccagcgaagc 



K 



K 



V 



V 



gttaaaactggcgcttgcgaaaggtttacagga' 



.tg< 



cgggcgtcgatgtgctggatatcgg 



msgteeiy 
tatgtccggcaccgaagagatctattto 



V 



F A T F H L G 

GCCACGTTCCATCTCGGAGTGGATGGCGGCAT 



H N 



CGAAGTTACCGCCAGCCATAACCCGATGGA' 



MDYNGMKLVRt,^ 
.TTACAACGGCATGAAGCTGGTGCGCGAAGG 



V 



N 



A R P I S G _ D _^ CG ^ ACTGCGCGATGTCC AGCGTCTGGCAGAAGCCAA 



GGCTCGCCCGATCAGCGGTGATAC 



TGACTTCCCTCCTGTCGATGAAACCAAACG' 



Itggtcgctatcagcaaatcaatctgcgtga 



10680 



10740 



10800 



10860 



10920 



10980 
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CGCTTACGTTGATCACCTGTTCGGTTATATCAACGTCAAAAACCTCACGCCGCTCAAGCT 11040 



GG' 



V I N S G N G A 111Q0 

N 



TGATCAACTCCGGGAACGGCGCAGCGGGTCCGGTG 



TAAAGCCCTCGGCGCACCGGTGGAATTAATCAA^ 1H60 

ccccaacggtaJtcctaac^^ 11220 
g^x£a*L^^ 11340 



F L E K N P G ^__ K „.^,^ C ^ CG ^ TCCACGTCTCTC CTGGAACAC 11400 

V M S K 



GTTCCTCGAAAAAAATCCCGGCGCGAAGATCATC 1 

cg^gatg^ggJgac^g^ 11460 

TATTAAAGAACGTATGCGCAAGGA^^ 11520 

TTA cttccgtgatttcgct^^ 
AC tgg?gtgcctgaL^^ "640 



11580 



tccggcaagcggtgagatcaacagcaaactggcgcaacccgttga^ 1"00 
ccLcagcattttagccgcgLg^ 

ctStgccgac^gcgct^ 11820 

tc ^ g Ltcacgcggtg^^ 11880 



End of orf XI 

_ _ E * 
GCTAAGTGAG TGATTATTTAC ATTAATC A' 

GTTATTGCGGTATATGATGAATATGTGGGCTTTTTTATGTATAACGACTATACCGCAACT 



L S E * .TTAAGCGTATTTAAGATTATATTAAAGTAAT 11940 

12000 



ttmctSKL^™^^ 12060 
acgattgagacgttcctttgcttaagacattttttcatcgcttatgtaataacaaatctg 

ccttatataaaaaggagaacaaaatggaacttaaaataattgagacaatagatttttatt 

atccctgtttacgatattatagccaaagttgtatcctgcatcagtcctgcaatatttcac 

gagtgctttgttaactgaatacatgtctgccattttccagatgataacgacgtcatcgca 

attgatggtaaaacacttcggcacacttatgacaagagtcgtcgcagaggagtggttcat 
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12180 
12240 
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12360 
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GTCATTAGTGCG-TTTCAGCAATGCACAGTC-TGGTCCTCGGATAGATCAAGACGGATGAGA 
AACCTAATGCGTTCACAGTTATTCATGAACTTTCTAAAATGATGGGTATTAAAGGAAAAA 
TAATCATAACTGATGCGATGGCTTGCCAGAAAGATATTGCAGAGAAGATATAAAAACAGA 
GATGTGATTATTTATTCGCTGTAAAAGGAAATAAGAGTCGGCTTAATAGAGTCTTTGAGG 
AGATATTTACGCTGAAAGAATTAAATAATCCAAAACATGACAGTTACGCAATTAGTGAAA 
AGAGGCACGGCAGAGACGATGTCCGTCTTCATATTGTTTGAGATGCTCCTGATGAGCTTA 
TTGATTTCACGTTTGAATGGAAAGGGCTGCAGAATTTATGAATGGCAGTCCACTTTCTCT 
CAATAATAGCAGAGCAAAAGAAAGAATCCGAAATGACGATCAAATATTATATTAGATCTG 
CTGCTTTAACCGCAGAGAAGTTCGCCACAGTAAATCGAAATCACTGGCGCATGGAGAATA 
AGTTGCACAGTAGCCTGATGTGGTAATGAATGAAATCGACTATAATATAAGAAGGCGAGT 
TGCATTCGAATGATTTTCTAGAATGCGGCACATCGCTATTAATATCTGACAATGATAATG 
TATTCAAGGCAGGATTATCATGTAAGATGCGAAAAGCAGTCATGGACAGAAACTTCCTAG 

End of the H- repeat 

CGTCAGGCATTGCAGCGTGCGGGCTTTCATAATCTTGCATTGGTTTTGATAAGATATTTC 



12420 

12480 

12540 

12600 

12660 

12720 

12780 

12840 

12900 

12960 

13020 

13080 

13140 



Start of orfl2 ^paGSYGRE 
TTTGGAGATGGGAAAA^GAATTTGTATGGTATTTTTGGTGCTGGAAGTTATGGTAGAGAA 13200 

ACAATACCCATTCTAAATCAACAAATAAAGCAAGAATOT 13260 

^tgtggatgaWJtt^ 13320 

N V A I A N__D_K 

TGCTTTCTAAAAGCCCCTTATT 1 " 



C F L K A P Y — ^^^j^^y^QipATTTTAATGTTGCTATTGCTAATGATAAG 13380 



TACGACAGAGAGTGTCTGAGTCAATATTATTACACGGGGTTGAACCAAT 13440 



ATACGAC 

CAT ccaaatag^ 13500 

F V T I S T N — T — ?~^^qqqaGGTTTTTTCATGCAAACATATACTCA 13560 



TTTGTTACAATATCTACTAATACTCATA1 



K 



rp^QQTTGCACATGATTGTCAAATAGGAGACTATGTTACATTTGCTCCTGGGGCTAAATGT 13620 



^TGGATATG^TA^ 13680 

g 2tgttcctaatcgcc^ 13740 

^^S^^ 13800 

End of or £12 

S 1 TAATGGGAATGCGAAAACACGTTCCAAATGGGACTAATGTTT 13860 



AGATCGCCAACATCTATT 
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AAAATATATATAATTTCGCTAATTTACTAAATTATGGCTTCTTTTTAAGCTATCCTTTAC 13 92 0 
TTAGTTATTACTG ATACAGC ATGAAATTTATAATACTCTGATAC ATTTTTATACGTT ATT 13 980 
CAAGCCGCATATCTAGCGGTAACCCCTGACAGGAGTAAACAATG 14024 
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GTTGACAAATACCGACCGTATAATGAATCAAACGTTCTGGATTGGTATTTATCCAGGCTT 

GACTACAGAGCATTTAGATTATGTCGTAAGTAAGTTTGAAGAATTTTTTGGTTTAAATTT 



60 



120 



Start of abe 

MLDVNKKILMTGAT 

CTAATTTTTAGGATAGG ATCCTTGATGTGAATAAGAAAATCCTAATGACTGGCGCTACTA 180 



SFVGTHLLHSLIKEGYSI I A 
GCTTTGTAGGTACCCATCTACTACATAGTCTCATAAAGGAAGGTTATAGTATTATTGCAT 

LKRPITEPTI INTLIEWLNI 
TAAAGCGTCCTATAACCGAGCCAACGATTATCAATACCTTGATTGAATGGTTGAATATAC 

ODIEKICQS SMNIHAIVHIA 
AAGATATAGAAAAAATATGTCAATCATCTATGAATATTCATGCGATTGTCCATATTGCAA 

TDYGRNRTPI SEQYKCNVLL 
CAGACTATGGTCGAAACAGAACCCCTATATCTGAACAATATAAATGTAATGTCCTATTAC 

PTRLLELMPALKTKFFISTD 
CAACAAGACTGCTTGAGTTAATGCCAGCGCTTAAAACGAAATTCTTTATTTCTACTGACT 

SFFGKYEKHYGYMRSYMASK 
CTTTTTTTGGGAAATATGAGAAGCACTATGGATATATGCGTTCTTACATGGCATCTAAAA 

RHFVELSKIYVEEHPDVCFI 
GACATTTTGTAGAACTATCAAAAATATACGTAGAGGAACATCCAGACGTTTGTTTTATAA 

NLRLEHVYGERDKAGKI IPY 
ATTTACGTTTAGAACATGTTTACGGTGAGAGGGATAAAGCAGGTAAAATAATCCCGTATG 

VIKKMKNNEDIDCTIARQKR 
TTATCAAAAAAATGAAAAACAATGAAGATATTGATTGTACGATCGCCAGGCAGAAAAGAG 

DFIYIDDVVSAYLKILKEGF 
ATTTTATTTATATAGACGATGTTGTTTCGGCCTATTTGAAAATTTTAAAGGAGGGTTTTA 

NAGHYDVEVGTGKSIELKEV 
ACGCTGGACACTATGATGTCGAGGTGGGGACTGGAAAATCGATAGAGCTAAAAGAAGTGT 

FEI IKKETHS S SKINYGAVA 
TTGAGATAATAAAAAAAGAAACGCATAGTAGTAGTAAGATAAATTATGGTGCAGTTGCGA 

MRDDEIMESHANTSFLTR L G 
TGCGTGATGATGAGATTATGGAGTCACATGCAAATACCTCTTTCTTGACTCGATTAGGTT 

End of aJbe Start of wzx 

M 

WSAEFSI EKGVKKML SMKE* 

GG AGTGCCGAGTTTTCTATTGAGAAGGGTGTGAAAAAAATGTTGAGTATGAAAGAG TAAT 1020 



240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 



NRI IRMLGVDKAIRYVI FGK 
GAATCGTATTATTAGAATGTTAGGTGTAGATAAAGCAATTCGTTATGTTATTTTTGGTAA 

IISVLTGLLLIMLISHHLSK 
GATAATATCTGTATTAACGGGTTTACTGTTAATAATGTTAATATCACACCATTTATCTAA 

DAQGYYYTFNSVVALQI IFE 
AGACGCACAGGGCTATTATTATACATTTAATTCAGTAGTGGCACTACAGATAATATTTGA 



1080 
1140 
1200 
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LGLSTVI IQFASHEMSALKY 
ATTGGGGCTATCAACGGTAATCATTCAATTCGCTAGCCATGAAATGTCAGCGTTAAAATA 

DYSERDIIGESKNKQRYLSL 
TGATTATTCTGAACGAGATATTATAGGTGAAAGTAAAAATAAGCAACGTTACCTATCGTT 

FRLAIKWYAVIALLI ILIVG 
ATTTCGGTTGGCAATAAAATGGTATGCAGTAATAGCTTTGCTAATAATATTAATAGTCGG 

PIGYVFFTQKEGLGVPWQGA 
TCCCATCGGGTATGTTTTTTTTACGCAAAAAGAAGGCTTAGGTGTACCTTGGCAAGGGGC 

WLLLTIVTAFNIFLVSVLSV 
ATGGTTATTATTAACAATAGTTACAGCTTTTAATATTTTTCTTGTTTCTGTACTTTCTGT 

AEGSGLITDVNKMRMYQSL L 
CGCTGAAGGGAGTGGGTTAATTACTGATGTGAATAAAATGAGAATGTATCAGTCGCTGTT 

AGILAVSLLISGFGLYATSA 
AGCTGGTATATTGGCAGTAAGCTTACTTATTAGTGGCTTTGGACTATATGCTACGTCTGC 

TAISGTIIFSIFSYKYFKKI 
AATAGCTATTTCAGGGACTATCATATTCTCCATATTTTCATATAAGTATTTTAAAAAAAT 

FLOSLKHKNKYTEGGISWVN 
TTTCCTGCAATCTTTAAAGCATAAAAATAAATATACTGAAGGTGGTATTTCATGGGTTAA 

EIFPMQWRIALSWMSGYF I Y 
TGAAATATTTCCTATGCAATGGCGAATTGCTCTAAGTTGGATGTCAGGGTATTTTATTTA 

FVMTPIAFKYFGAIYAGQLG 
TTTTGTTATGACCCCCATTGCATTCAAATATTTCGGGGCTATATATGCAGGGCAGTTAGG 

MSLTLCNMVMATGLAWISTK 
GATGTCTTTAACATTATGCAATATGGTAATGGCTACGGGCCTGGCTTGGATATCCACTAA 

YPKWGVMVSNKQLAELSKSF 
ATATCCAAAATGGGGAGTAATGGTTTCCAACAAACAGCTTGCGGAACTGAGTAAATCGTT 

KSAVMQSSFFVLTGLTGVYI 
CAAAAGTGCAGTAATGCAATCATCCTTTTTTGTCTTGACAGGATTAACTGGTGTATACAT 

SLWLLKLSGSNIGERFLGLQ 
TTCATTATGGTTATTGAAATTATCTGGTTCAAACATTGGCGAGCGGTTTTTGGGATTGCA 

DFFFLSLAI IGNHIVACFAT 
GGATTTTTTCTTTTTATCTTTAGCAATTATTGGTAATCACATTGTAGCTTGCTTTGCAAC 

YIRAHKTEKMTLASCIMAL L 
CTATATAAGAGCGCATAAAACTGAAAAAATGACATTGGCATCATGTATAATGGCTCTCTT 

TITTMLFVAYLEYSRFYMLM 
GACTATAACTACAATGTTGTTTGTTGCATATTTAGAGTACTCGAGGTTCTACATGTTAAT 

YAALTWLYFVPQTYI I o F T K v R _ F 

S JL JS- U 

GTATGCAGCACTAACGTGGTTATATTTTGTTCCTCAAACTTATATAATCTTTAAAAGATT 
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Start o£ wbaR End of wzx 

KSSY E * LTiAlpTYNR 

caagagttcttJsagtaaaaaacctcttcttactattgctattccgacatataaccgct 

? TT catgtttggctcgW^ 2460 

AACTCGAGGTTAOTGTTTGTGATAA 2520 

gtgg^ttagataaaa^aaS^ 2580 

^aactJcc*^ 2640 

JtGCAT^AGA^GgJgtI^ 2760 

atgtcaLacgtcat^ct^^ 2820 

TCACGTTTATTTCTGGAATGATATGTAAGAAAACTGATGCAATTGTC 2880 

I F S P Q T T G K .^^^^^^^JatSgcCATTAC 2940 

NNIIEAEPD 



TTTTCAGTCCGCAAACTACTGGAAAATATCTTATGCATTTAACATGGCAATTGCCATTAC 

irJL&ui^^ 3000 

attcagotgg^tItcaW^^ 3060 

? tt ?ttLcccaSagLc^ 3120 

£ AC ttaact?ca£ag^^ 3180 

S AG attccgata§tg£a4ta^ 32 *o 

Itc£atatccta!L^tg^^ 3300 

End of wb&R 

TAA^GCGGaLtIaAAATTATTCAAGATGGTTTGCTGAAAACGACTTATAGGACTATCTA 3360 

Start of wbaL LNLI ISLLSKV 

Jst^ctItag^taagaot^ 3420 

AGGCG^TCAaLgC*^^ 3480 

Figure 9/3 
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gggaLaXttttaLt^ 3 540 

TLGSTAGTA?GcSGG?TGi T GLiTGCTSGAiTGLGS T G?ATL^TTlTGGTGAT 

GA^^TGLcCTTlTTTCTLAkcSTGATCGTAkTGTT^AAGTGATAATGTTCAT 3660 

tqpVSCLILENDILIGSKVY 
^TCTTGCGTATCATGTTTAATTTTAGAAAACGATATATTAATTGGTAGCAAAGTTTAT 



3600 



rrnHSHGSYKVCSPKIEP PA 
ATAGGCGATCATAGCCATGGCAGTTATAAAGTATGCAGTCCGAAAATAGAACCGCCAGCA 

mvdt rniAPIKIGNCCWIGD 
A^TA^GCCATTAGGTGATATTGCTCCTATTAAAATAGGTAATTGCTGCTGGATTGGAGAT 

aLgc\g?aa£tctggctg^^ 
G tcg?caLgaWtaaLg?c<^ 3960 



3720 
3780 
3840 
3900 



4020 
4080 
4140 



End of wba.L Start of wbaQ 

IKVF * MNVFISICIPSYNRA 
ATAAAGGTATTTTAAAATGAATGTTTTTATCAGTATTTGTATACCGTCTTATAATAGAGC 

TGAGTTTTTAGAGCCACTACTCGATAGCATATATAATCAAGATTATTGTC 

n^PVTVCEDKSPQRDEINSI 
TGATTTTGAGGTCATTGTTTGTGAAGATAAATCTCCACAGAGAGATGAGATAAACTCTAT 

taJcgLSct^^^ 4200 
taatttaggctatga'taL^ 4260 
c^ra!c*£^ 4320 

t v-ANPEIVLATRAYGWFKEN 
TTTCaLgCTAATCCTGAAATTGTAT^ 4380 
ompt CDTVRHLTDDTLFQPG 

tc^aatgLttatgtgatactgttcgtcatttaacagacgatactttatttcagccggg 

ADAIKFFFRRVGVISGF IVN 
GGCTGATGCCATTAAATTTTTCTTCCGTAGAGTTGGAGTTATTTCAGGCTTTATTGTCAA 

RFKAKKLSSDLFDGRLYYQM 

tgctgaWgcaaaaaaactatcgagtgatttatttcatgggcgtttatattatcaaat 

VLAGMLMAEGQGYYFSDVMT 

gtaccttgctggtatgctaatggctgaaggtcagggatactattttagcgacgtgatgac 
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LSRDTEAPDFGNAGTEKGVF 
ATTGTCGAGGGATACAGAGGCTCCTGACTTTGGTAACGCTGGAACTGAAAAAGGAGTTTT 

TPGGYKPEGRIHMVEGLLI. I 
CACCCCGGGGGGGTATAAACCAGAGGGCCGTATACATATGGTTGAAGGCTTGTTGCTAAT 

AKYIEDTTKIDGVYAGIRKD 
TGCAAAATATATAGAAGATACAACAAAAATTGATGGCGTTTATGCTGGAATTAGAAAAGA 

LANYFYPYIRDQLDLPLYTY 
CTTAGCGAACTATTTTTATCCTTATATTCGAGATCAACTCGACTTGCCTCTTTATACTTA 

IKMINKFRKMGFSNEKLFYV 
TATTAAAATGATAAATAAATTTCGGAAAATGGGATTTTCAAATGAAAAGCTTTTCTATGT 

HAFLGYVLKR RGYDALIKYI 
GCATGCCTTTTTAGGGTATGTACTAAAACGGAGGGGCTATGATGCTTTAATTAAATACAT 

End of wbaQ 

RSKKGGTPRLGI * 
TCGTAGCAAAAAAGGCGGTACTCCGCGTCTTGGTATT TAACCTCCACTTTCAAAAAATGT 

TATGAATATACTTCTTGCTGCGATATTAGGCGTTAACTTATTTTCTCCATATATTAGTTC 

Start of wzy „ tt 

MLPFP PGAILRDVLNV 
GTGGATGGTGGGTATGCTGCCATTTCCACCAGGAGCAATCCTAAGGGATGTACTCAATGT 

FFVALVLVRFVIDRK KTYFP 
ATTTTTTGTGGCGTTAGTGCTAGTTCGATTTGTCATTGATAGGAAAAAAACTTATTTCCC 

LVFTIFSWSAVILWVIALTI 
GTTGGTTTTTACTATTTTTTCATGGTCGGCGGTAATACTATGGGTAATAGCGTTAACTAT 

FSPDKI QAIMG GRSYILFPA 
ATTCTCACCGGATAAAATTCAAGCAATTATGGGGGGGCGGAGTTATATTTTATTCCCGGC 

VFIALVILKVSYPQSLNIEK 
AGTTTTCATAGCATTAGTGATTTTAAAAGTATCATACCCGCAATCCTTAAATATTGAAAA 

IVCYI IFLMFMVATI SI IDV 
AATAGTTTGCTACATAATTTTTCTAATGTTTATGGTTGCGACAATATCTATTATTGATGT 

LiMNGEF IKL LGYDEHYAGEQ 
ACTAATGAATGGAGAGTTCATTAAATTGCTCGGATATGATGAGCATTATGCAGGAGAACA 

LNLINSYDGMVRATG GFSDA 
ATTAAACTTAATTAATAGCTATGATGGGATGGTCCGGGCTACAGGCGGTTTTAGTGATGC 

LNFGYMLTLGVL LCMECFSQ 
TCTCAATTTTGGATATATGCTCACATTAGGTGTTTTGTTATGTATGGAGTGTTTTTCCCA 

GYKRLLMLI I SFVLFIAICM 
AGGATATAAAAGATTATTGATGCTTATTATTAGTTTTGTGCTATTTATAGCGATCTGCAT 
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SLTRGAILVAALIYALYI IS 
GAGTCTTACTAGAGGAGCAATACTTGTTGCTGCGCTTATTTACGCACTTTATATAATTTC 5760 



NRKMLFCGITLFVI IIPVLA 
AAATCGGAAGATGCTTTTTTGTGGAATAACTTTATTTGTAATAATTATACCCGTTTTAGC 



5820 
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ISTNIFDNYTEILIGRFTDS 
AATTTCTACTAATATTTTTGACAACTATACAGAAATTTTGATCGGCAGGTTTACAGATTC 

SQASRGSTQGRIDMAINSLN 
GTCTCAGGCATCGCGTGGATCTACACAGGGGCGGATAGATATGGCAATTAATTCATTAAA 

FLSEHPSGIGLGTQGSGNML 
CTTCCTGTCAGAACATCCATCAGGTATAGGTCTGGGTACTCAAGGTTCAGGAAACATGCT 

SVKDNRLNTDNYF FWIALET 
TTCGGTAAAAGATAATAGGTTAAATACGGATAATTATTTTTTCTGGATCGCCCTTGAGAC 

GIIGLIINIIYLASQFYSST 
TGGTATTATTGGCTTAATCATAAATATTATTTATCTGGCAAGTCAATTTTATTCTTCAAC 

LLNRIYGSHCSNMHYRLYFL 
TTTACTAAATAGAATATATGGCAGTCATTGTAGCAATATGCACTATAGATTATATTTTCT 

FGSIYFISAALSSAPSSSTF 
CTTTGGAAGTATATATTTTATAAGTGCAGCGTTAAGTTCAGCACCTTCGTCATCAACTTT 

SIYYWTVLALIPFLKLTNRR 
TTCTATATATTATTGGACAGTTTTAGCTTTGATTCCATTTTTAAAATTAACAAATAGACG 

End of wzy Start of wbaW 

CTR*MNNKKVLMDISWSNKG 
GTGCACGCGATAATGAATAATAAAAAGGTTTTGATGGATATTAGTTGGTCTAATAAAGGG 

GIGRFTDEISKLLCDISKEE 
GGGATTGGACGTTTTACTGATGAAATTTCTAAACTACTATGTGATATATCTAAGGAGGAA 

LYRKCAS PLAPLGLAVNIFL 
CTATATAGAAAATGTGCTTCTCCGCTGGCCCCATTAGGTTTAGCAGTCAATATTTTTCTG 

RKKTDVVFLPGYIP PLFCSK 
CGAAAGAAAACTGATGTGGTTTTTCTTCCTGGCTATATTCCACCACTTTTTTGTTCGAAA 

KFI ITIHDLNHLDLNDNSSL 
AAGTTCATAATAACAATACATGATCTAAATCATCTGGATTTAAATGATAATTCCTCTCTT 

FKRLFYNFI IKRGCRKAYKI 
TTTAAGAGGTTATTTTATAATTTTATAATAAAGCGCGGTTGTAGAAAAGCATATAAAATA 

FTVSNFSKERIVAWSGVNPN 
TTTACAGTTTCGAATTTTTCAAAAGAAAGAATAGTAGCATGGTCAGGTGTAAACCCTAAT 

KIVTVYNGVS SLFNADVKPL 
AAAATAGTCACGGTATATAATGGGGTATCTAGTCTATTTAATGCCGATGTAAAACCATTG 

NLGYKYLLCVGNRKTHKNEK 
AATTTAGGCTATAAATATTTGCTATGTGTAGGAAACAGAAAAACTCATAAGAATGAGAAG 

CVI SAFAKADIDPS I KLVFT 
TGTGTTATATCTGCCTTTGCCAAAGCAGATATTGATCCATCAATAAAACTCGTTTTTACT 

GNPCNDLEKLI IQHGLSERV 
GGTAATCCTTGTAATGATTTAGAAAAACTAATAATACAACATGGTTTAAGTGAACGTGTA 



5880 

5940 

6000 

6060 

6120 

6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 

6780 

6840 

6900 

6960 



Figure 9/6 



PCT/AU98/00315 
WO 98/50531 



37/58 



A^GTTCTTOGGGTTCGTCTCTGLALGATTiAC^ATCGTTATL^GGGCTTOTTAGGA 7020 

ttag?tttcccttctt£atatgaa^^^ 7080 

GGTATTCCTGTATTAACTTCTCTAACTTC 7140 
ATTCTTGTCGA^CCTCTTTC 7200 

GA^TgLcTTCGTaLcATTTAATC^^ 7260 

Start of wfcaZ 

W Q N V V S E I E M V L T E A C D m G e Nj ^ 

tggcaaaacgtggttagtgagattgaaatggtactgacagaggcatgtgatggaaataaa 



7320 



7380 



7500 
7560 



End of wbaW 

*PTKISLVHEWLLSYAGSEQV 

tgaaataaaaatatctctcgttcatgagtggttattaagttatgcaggctccgaacaggt 

ATCATCTGCCATCCTGCATGTTTTTCCTGAAGCGAAGTTATATTCGGTC 7440 

aacggatgLcaaagaaSac^ 

t T>TfAKKFYQKYLPLMPLAIE 

tttacctaLgctaaaaaattttaccagaaatatttaccactaatgccactggctattga 
ac LcttgatttatcagLgc"t^ 7620 

TG^AiTT?CGGAC?AG?TcLcTTCACA^TAGC T LG?TCATTCTC?TAJTcLTlTGC 7680 

g^gatt^agcatcag^^^^ 7740 

GTTAGCAAAATG^COTCTTCACAAAATACGAATTTGGGATTCTCG 7800 

tgatcattttatagctaattctcaatatatcgcgcgtagVattaaa 7860 

tt» j. eviYPPVDVDNFEVKNEK 
TGAGGCTT5AGTTATATATCCGCCTGTAGATGTGGATAATTTTGAAGTAAAAAATGAAAA 7920 

nnvYFTASRMVPYKRIDLIV 
GCAAGACTATTATTTCACAGCATCCCGTATGGTACCCTACAAACGTATTGATCTTATTGT 

EAFSKMPEK KLVVIGDGP E M 
CGAAGCCTTTAGTAAAATGCCGGAAAAGAAATTAGTAGTTATTGGTGATGGACCGGAGAT 

K K IKSKATDNIKL LGYQSF P 
GAAAAAAATAAAGAGCAAGGCTACAGACAATATAAAATTGCTCGGTTATCAATCTTTTCC 
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tg^tttaaLgLt^gcLagcgScaSgg^gt^tt^tgcagJgg^agaggactt 8160 

TGGAATAATACCTGTCGAAGCTCA^GCTTGCGG 8220 

TGG^GScTTAGAAArcGiTC^cLcTAGGTGJAGLGLcJGAOTGScA^TTCTTCAA 8280 

gg LcLaata£tgJttctttgcatg^ 8340 

ttttacatctcaggc"ttgtag^^ 8400 



8460 



T7PKMFVNEKWNLFKTEQI IK 

agLtttaagaactttgttaatgaaaagtggaatcttttcaaaacagaacagattattaa 

End of wbaZ Start of^C R L x p v T M A G G i 

ACGTtLtTATGGTTTATTGAAISTCTAAATTAATACCAGTAATAATGGCCGGTGGGATT 852 0 

ggtagccgtttgtggccactttcacot^ 8580 

gctgLttatctatcctgcaaaaca?^ 8640 

CCTTTAGTCATTTGTAATGATAG^ 8700 

aataaactagJaaataa^ca^^^^ 8760 

gSgctcgScgJtt^ttgttcacttcLaStgtcgJcgmgLgJccJgc^tttcc^gtc 8820 

c£tg^g<£tc!!t<^ 8880 



PFFATOGKLVTFGIVPTQA E 

gLttttttgcaacacaaggtaagctagtaacgtttggtattgtacccacacaggccgaa 



8940 

a ?tgg\:tacggWata^tgta^^ 9000 

gLt^AgLaLcCTGA^CGA^ 9060 

tattggaacagcggta?gt^^ 9120 
cWcccccga'tatttaccLgc^^ 9180 
gattttatccgtattgataaagaagcattc 9240 
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&VMFHTRHAVVVPMNAGWSD 
GCGGTAATGGAACATACTAGGCATGCAGTTGTCGTACCGATGAATGCCGGCTGGTCAGAT 

vrqwSSLWDISKKDPQRNVL. 
GTGGGGTCATGGTCTTCACTGTGGGATATTTCTAAGAAAGATCCACAACGTAATGTATTA 

wrniFAYNSKDNYlYSEKSF 
CATGGCGATATTTTTGCATATAATAGTAAAGATAATTATATCTATTCTGAAAAATCGTTT 

T^TIGVNNLVIVQTADALLV 
ATTAGTACAATCGGAGTAAATAATTTAGTTATCGTGCAGACAGCAGATGCATTATTAGTA 

SDKDSVQDVKKVVDYLKANN 
TCTGATAAAGATTCAGTCCAGGATGTTAAAAAAGTTGTTGATTATTTAAAAGCTAATAAT 

RNEHKKHLEVFRPWGKFSVI 
AGAAACGAACATAAAAAACATTTAGAGGTTTTCCGACCGTGGGGAAAATTTAGCGTAATT 

HSGDNYLVKRITVKPGAKFA 
CATAGTGGCGATAATTATTTAGTTAAAAGAATAACTGTTAAACCAGGCGCGAAGTTTGCT 

anMHLHRAEHWIVVSGTACI 

gctcLatgcatctccatcgtgctgagcattggatagtggtatctggtactgcttgtatt 

TKGEEIFTISENESTFIPAN 
ACTAAGGGGGAAGAAATTTTTACAATTTCGGAGAATGAATCAACATTTATACCTGCTAAT 

TVHTLKNPAT I PLELI E I Q S 
ACAGTTCATACGTTAAAAAACCCCGCGACTATTCCATTAGAACTAATAGAAATTCAATCT 

PTYLAEDDI IRLEKHSGYLE 
GGCACCTATCTTGCGGAGGATGATATTATTCGCCTGGAGAAACATTCTGGATATCTGGAG 

End of manC Start of mauB 

MKNIYNTYDVINKSGIN 
TAATGAATTGATGAAAAATATATATAATACTTACGATGTTATCAACAAATCTGGAATTAA 

TrrTSGARGLVTDFTPEVCAR 
TTTTGGAACCAGTGGTGCCCGCGGCCTTGTTACCGATTTTACACCCGAAGTTTGCGCACG 

FTISFLTVMQ qrfsfttval 
ATTTACCATTTCCTTTTTGACAGTAATGCAGCAAAGATTCTCATTTACAACGGTTGCGCT 10080 

AIDNRPSSYAMAQACAAA L Q 
CGCAATTGATAATCGTCCAAGCAGTTACGCGATGGCTCAAGCTTGTGCCGCTGCTTTGCA 

FKGIKTVYYGVIPTPALAHQ 
AGAAAAAGGAATTAAAACCGTTTACTATGGCGTAATTCCAACACCTGCTTTAGCTCATCA 

SISDKVPAIMVTGSHIPF D R 
ATCAATTTCCGATAAAGTACCTGCAATCATGGTTACTGGCAGTCATATCCCTTTTGACCG 

NGLKFYRPDGEITKDDENAI 
TAATGGCCTGAAATTTTATAGACCAGATGGTGAAATTACTAAAGATGATGAGAATGCTAT 

THVDASFMQPKLEQLTISTI _ oort 
TATTCATGTTGATGCCTCATTTATGCAGCCTAAGCTTGAACAATTGACAATTTCCACAAT 10380 
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AARNYILRYTSLFPMPFLKN 
CGCTGCTAGAAATTATATTCTACGATATACCTCATTATTTCCAATGCCATTCTTGAAAAA 10440 

KRIGIYEHS SAGRDLYKTLF 
TAAGCGC ATTGGAATTTATGAGCATTCTAGTGCGGGTCGTGATCTCTATAAGACGTTATT 10500 

KMLGATVVSLARSDEFVPID 
C AAAATGTTGGGTGCTACAGTTGTTAGTTTAGC AAGGAGCGACGAATTTGTTCCTATTGA 10560 

TEAVS EDDRNKAI TWAKKYQ 
TACTGAAGCTGTAAGTGAAGATGATAGAAATAAAGC AATC ACATGGGCAAAAAAATATCA 10620 

LDAIF STDGDGDRPLIADEY 
GTTAGATGCTATATTTTCAACTGATGGTGATGGAGATCGCCCTCTGATAGCTGACGAATA 10680 

GNWLRGDILGL LCSLELAAD 
TGGAAATTGGTTAAGAGGAGATATATTAGGCCTTCTGTGCTCTCTCGAATTAGCTGCTGA 10740 

AVAIPVSCNSTIS SGNFFKH 
TGCAGTCGCTATTCCTGTAAGCTGCAACAGTACAATCTCATCTGGTAACTTTTTTAAAC A 10800 

VERTKIGSPYVIAAFAKLSA 
TGTGGAACGAACAAAGATTGGTTC ACCCTATGTGATTGCAGCATTTGCTAAATTATCTGC 10860 

NYNCIAGFEANG GFLLGSDV 
AAACTATAATTGTATAGCTGGTTTTGAAGCGAATGGTGGCTTTCTGCTAGGTAGCGATGT 10920 

YINQRLLKALPTRDALLPAI 
TTATATTAATCAGCGTTTACTTAAGGCATTACCAACACGTGATGCTTTATTACCTGCCAT 10980 

MLLFGSKDKSI SELVKKLPA 
TATGCTTCTGTTTGGT AGCAAGGAC AAAAGTATTAGTGAGCTTGTTAAAAAACTTCCTGC 11040 

RYTYSNRLQDI SVKTSMSLI 
TCGCTATACCTATTC AAACAGATTACAGGATATAAGTGTTAAAAC AAGTATGTCTTTAAT 11100 

NLGLTDQEDFLQY IGFNKH H 
AAATCTTGGTCTGACAGATCAAGAGGATTTTTTGCAGTATATTGGTTTTAATAAACATCA 11160 

ILHSDVTDGFRITIDNNNI I 
TATATTACATTCTG ATGTTACTG ATGGCTTTAG AATCACT ATCGATAACAACAATATTAT 1122 0 

HLRPSGNAPELRCYAEADSQ 
TCATTTACGACCTTCAGGCAATGCCCCTGAGTTGCGTTGCTATGCGGAGGCTGACTCGCA 11280 

EDACNIVETVLSNIKSKLGR 
AGAGGATGCATGTAATATTGTTGAAACTGTTCTCTCTAATATCAAAAGCAAACTGGGTAG 1134 0 

End of manB 

A * 

AGCT TAATGCTGTTGATAATAGAGCGTTTCTTTCCAGTAATACTTTGTCTGGTTATCTGG 11400 

Start of wbaP 

MDRFDNKYNPNL 
TACCCAAGTTGAGGGTGAGAATTAA ATGGATCGTTTTGATAATAAGTATAACCCAAATTT 11460 

CKILLAISDLLFFNVALWAS 
ATGC AAAATATTATTGGCTATATC AGATTTACTGTTTTTTAATGTAGCCTTATGGGC ATC 11520 
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LGVVYLI FDEVQRFVPQEQL 
GTTAGGAGTTGTATATTTAATCTTTGATGAAGTTCAGCGATTTGTACCACAAGAGCAATT 

DNRFISHFILSIVCVGWFWV 
AGATAATCGATTTATATCACATTTTATTCTATCTATAGTATGCGTTGGATGGTTTTGGGT 

RLRHYTYRKPFWYELKEVIR 
TCGACTGCGTCACTATACATATCGAAAGCCATTCTGGTATGAGTTGAAAGAGGTTATTCG 

TIVIFAVFDLALIAFTKWQF 
TACTATCGTTATTTTTGCTGTGTTTGATTTGGCTTTAATTGCGTTTACAAAATGGCAGTT 

SRYVWVFCWTFAI ILVPFFR 
TTCACGCTATGTCTGGGTGTTTTGTTGGACTTTTGCCATAATCCTGGTGCCTTTTTTTCG 

ALTKHLLNKLGIWKKKTI I L 
CGCACTTACAAAGCATTTATTGAACAAGCTAGGTATCTGGAAGAAAAAAACTATCATCCT 

GSGQNARGAYSALQSE EMMG 
TGGGAGCGGACAGAATGCTCGTGGTGCATATTCTGCGCTGCAAAGTGAGGAGATGATGGG 

FDVIAF FDTDASDAEINMLP 
GTTTGATGTTATCGCTTTTTTTGATACGGATGCGTCAGATGCTGAAATAAATATGTTGCC 

VIKDTETIWDLNRTGDVHY I 
GGTGATAAAGGACACTGAGACTATTTGGGATTTAAATCGTACAGGTGATGTCCATTATAT 

LAYEYTELEKTHFWLRELSK 
CCTTGCTTATGAATACACCGAGTTGGAGAAAACACATTTTTGGCTACGTGAACTTTCAAA 

HHCRSVTVVP SFRGLPLYNT 
ACATCATTGTCGTTCTGTTACTGTCGTCCCCTCGTTTAGAGGATTGCCATTATATAATAC 

DMSFIF SHEVML LRIQNNLA 
TGATATGTCTTTTATCTTTAGCCATGAAGTTATGTTATTAAGGATACAAAATAACTTGGC 

KRSSRFLKRTFDIVCS I M I L 
TAAAAGGTCGTCCCGTTTTCTCAAACGGACATTTGATATTGTTTGTTCAATAATGATTCT 

I IASPLMIYLWYKVTRDG GP 
TATAATTGCATCACCACTTATGATTTATCTGTGGTATAAAGTTACTCGAGATGGTGGTCC 

AIYGHQRVGRHGKLFPCYKF 
GGCTATTTATGGTCACCAGCGAGTAGGTCGGCATGGAAAACTTTTTCCATGCTACAAATT 

R S M V M N S 
TCGTTCT ATGGTTATG AATTC 12441 
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GAATTCGGGAGGCGCAATGAAAGTCAGCTTTTTTCTGCTGAAATTTCCACTCTCATCGGA 

AACCTTTGTGCTGAATCAGATTACTGCGTTTATTGATATGGGCCATGAGGTGGAGATTGT 

CGCGTTACAAAAAGGCGATACCCAACATACTCACGCCGCCTGGGAGAAGTATGGCCTGGC 

GGCGAAAACCCGCTGGTTACAGGATGAGCCCCAGGGACGGCTGGCGAAACTGCGCTACCG 

GGCATGTAAAACGCTGCCGGGGCTGCATCGGGCGGCGACCTGGAAAGCGCTCAATTTTAC 

CCGCTATGGCGATGAATCACGCAATTTGATCCTTTCCGCGATTTGCGCGCAGGTGAGCCA 

GCCTTTTGTGGCGGATGTGTTTATCGCACACTTTGGTCCGGCGGGCGTGACGGCGGCCAA 

ACTACGCGAACTGGGCGTGCTTCGCGGCAAAATCGCGACTATTTTCCACGGGATTGATAT 

CTCTAGTCGTGAGGTGCTCAGTCATTACACGCCGGAGTATCAGCAGTTGTTTCGTCGTGG 

CGATCTGATGCTGCCCATCAGCGATCTGTGGGCCGGTCGCCTGAAAAGTATGGGCTGTCC 

GCCGGAAAAGATTGCCGTTTCGCGCATGGGCGTCGACATGACGCGTTTTACCCATCGTTC 

GGTGAAAGCGCCAGGGATGCCGCTGGAGATGATTTCCGTCGCGCGCCTGACAGAAAAAAA 

AGGCCTGCATGTGGCGATTGAAGCCTGTCGGCAACTGAAAGCACAGGGCGTGGCGTTTCG 

CTACCGCATTCTGGGGATTGGCCCGTGGGAACGTCGGCTGCGCACGCTCATCGAGCAGTA 

TCAGCTAGAGGATGTCATTGAGATGCCGGGGTTTAAACCGAGCCATGAAGTGAAGGCGAT 

GCTGGATGACGCCGATGTTTTTTTGCTGCCGTCGATTACCGGTACGGATGGCGATATGGA 

AGGTATTCCGGTAGCGCTGATGGAGGCGATGGCGGTAGGGATTCCCGTGGTATCTACCGT 

GCATAGCGGTATTCCGGAACTGGTGGAGGCCGGCAAATCCGGCTGGCTGGTGCCGGAAAA 

CGATGCGCAGGCGCTGGCGGCCCGACTCGCTGAGTTCAGCCGGATTGACCACGACACGCT 

GGAGTCGGTGATCACGCGCGCCCGTGAAAAAGTGGCGCAAGATTTTAATCAGCAGGCGAT 

TAATCGCCAGTTAGCCAGCCTGCTACAAACGATATAAACGAGGTGGTATGCCCGCGACTA 

AATTCTCCCGACGTACCCTCCTGACGGCAGGTTCTGCGCTTGCTGTTCTTCCTTTTCTGC 

GCGCCTTGCCGGTACAGGCGCGTGAACCTCGCGAGACCGTCGATATTAAGGATTATCCGG 

CGGATGACGGTATCGCCTCGTTCAAACAGGCCTTCGCCGACGGACAGACCGTGGTCGTAC 

CGCCAGGATGGGTGTGTGAAAATATCAATGCGGCGATAACGATTCCGGCGGGAAAAACGC 

TGCGGGTACAGGGCGCGGTGCGTGGGAATGGCCGGGGACGGTTTATTTTGCAGGACGGGT 

GTCAGGTGGTGGGGGAGCAGGGCGGCAGTCTGCACAATGTGACGCTGGATGTTCGCGGGT 

CGGACTGTGTGATTAAAGGCGTGGCGATGAGCGGCTTTGGCCCCGTCGCGCAAATTTTCA 

TCGGTGGTAAGGAACCGCAGGTGATGCGTAATCTCATTATCGATGACATCACCGTTACCC 

ACGCCAACTACGCCATTCTCCGCCAGGGATTTCATAACCAAATGGATGGCGCGCGGATTA 

CGCATAGCCGCTTTAGCGATTTACAGGGGGACGCCATTGAGTGGAATGTCGCGATTCACG 

ACCGCGACATCCTGATTTCCGATCATGTCATCGAACGCATTAATTGTACCAATGGCAAAA 

TCAACTGGGGGATCGGCATCGGGCTGGCGGGTAGCACCTATGACAACAGTTATCCTGAAG 
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ACCAGGCAGTAAAAAACTTTGTGGTGGCCAATATTACCGGATCTGATTGCCGACAGCTTG 

TGCACGTAGAAAATGGCAAACATTTCGTCATTCGCAATGTCAAAGCCAAAAACATCACGC 

CCGGTTTCAGTAAAAATGCGGGTATTGATAACGCAACGATCGCAATTTATGGCTGTGATA 

ATTTCGTCATTGATAATATTGATATGACGAATAGTGCCGGGATGCTCATCGGCTATGGCG 

TCGTTAAAGGAAAATACCTGTCAATTCCGCAAAACTTTAAATTAAACGCTATTCGGTTGG 

ATAATCGCCAGGTTGCTTATAAATTACGCGGCATTCAAATTTCCTCCGGCAACACCCCCT 

CTTTTGTCGCCATCACCAATGTACGGATGACGCGTGCTACGCTGGAACTGCATAATCAAC 

CGCAGCACCTCTTTCTGCGCAATATCAACGTGATGCAAACTTCAGCGATTGGCCCGGCGT 

TAAAAATGCATTTCGATTTGCGTAAAGATGTACGTGGTCAATTTATGGCCCGCCAGGACA 

CGCTGCTTTCCCTCGCTAATGTTCATGCCATCAATGAAAACGGGCAGAGTTCCGTGGATA 

TCGACAGGATTAATCACCAAACCGTGAATGTCGAAGCAGTGAATTTTTCGCTGCCGAAGC 

GGGGAGGGTAAGTACCGCTATTTTTACGAAAATTCCTGGGAAAAAGTTGTTCATACTTAA 

TGTTATGGTGCCGACTAAGACGTAATGTAGAGCGTGCCATCATTATCCCTGGCAGCAGAG 

TAATTCATGCTGGCGAAAACAAGCTAAAGAGCTATAATTCAGCAACCATTTTACAGGTGG 

AAGAAACAATGATGAATTTGAAAGCAGTTATACCGGTAGCGGGTTTGGGTATGCATATGT 

TGCCTGCCACCAAGGCAATCCCAAAAGAGATGCTACCGATCGTCGACAAGCCAATGATTC 

AGTACATTGTCGATGAGATTGTGGCTGCAGGGATCAAAGAAATCGTGCTGGTGACTCACG 

CGTCTAAAAACGCCGTTGAGAACCACTTCGACACCTCTTATGAACTTGAATCACTTCTTG 

AGCAGCGCGTTAAGCGTCAGCTTTTGGCGGAAGTGCAATCTATCTGCCCACCGGGCGTGA 

CGATTATGAACGTTCGCCAGGCGCAGCCGTTAGGGCTGGGGCATTCTATTCTGTGCGCGC 

GTCCGGTCGTGGGCGATAACCCTTTCATTGTGGTACTCCCGGATATTATTATCGATGATG 

CTACCGCCGATCCGCTGCGCTATAACCTTGCGGCGATGGTGGCGCGTTTCAATGAAACGG 

GTCGCAGCCAGGTGCTGGCGAAGCGCATGAAAGGTGATTTATCGGAGTATTCCGTTATCC 

AGACGAAAGAACCTCTGGATAATGAAGGCAAAGTCAGCCGGATTGTGGAGTTTATCGAAA 

AACCGGATCAGCCGCAGACGCTGGATTCCGATTTGATGGCGGTAGGCCGTTATGTGCTTT 

CAGCCGACATCTGGGCGGAACTGGAAAGAACCGAACCGGGCGCCTGGGGCCGCATCCAGC 

TCACCGATGCCATTGCTGAACTGGCGAAAAAACAGTCGGTTGACGCGATGCTAATGACGG 

GTGACAGCTATGACTGCGGTAAAAAAATGGGCTACATGCAGGCATTTGTGAAGTACGGGC 

TGCGCAACCTGAAAGAAGGAGCCAAGTTCCGTAAGAGCATAGAGCAGCTTTTGCATGAAT 

AAGTATTAACAACCGTGATAAATGGTTGGTGATAAACATAATAACGGCAGTGAACATTCG 

AAGCGGCAAGTTGGCTGAAACGAGTGTTGACTGCCGTTTTAGTTTTGTATAAAGGGCTTA 

AGTAACAAGGGGTTATCTGGAGCATTTTAATGCTGATTTTATAAGATTAATCCTTGTTTC 

CGGATGCAATTAATAAGACAATTAGCGTTTAAGTTTTAGTGAGCTTTGCCCTGCTGGGCG 
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AGGTTTGCAACAAGTCGATATGTACGCAGTGCACTGGTAGCTGATGAGCCAGGGGCGGTA 
GCGTGTGTAACGACTTGAGCAATTAATTTTTATTGGCAAATTAAATACCACATTAAATAC 

Start of rmlB 

VKILITG GAGFIGS 
GCCTTATGGAATAGAAAAGTGAAGATACTTATTACTGGCGGGGCAGGTTTTATTGGATCA 



4020 
4080 

4140 



AVVRHI IKNTQDTVVNIDKL 
GCTGTTGTCCGCCATATTATTAAGAATACACAGGACACTGTAGTTAATATTGATAAATTA 4200 

TYAGNLESLSDI sesnrynf 
ACCTACGCCGGTAATCTTGAATCCCTTTCTGATATTTCTGAAAGTAATCGCTAC AATTTT 42 60 

EHADICDSAE ITRIFEQYQP 
GAACACGCGGATATTTGTGATTCCGCTGAAATAACGCGTATTTTTGAGCAGTACCAGCCG 432 0 

DAVMHLAAESHVDRS ITGPA 
GACGCGGTGATGCATTTGGCTGCGGAAAGTCATGTGGACCGTTCGATTACCGGGCCAGC A 43 80 

AF I ETNIVGTYAL LEVARKY 
GCATTT ATTGAAACC AATATCGTCGGC ACCTATGC ACTTCTTGAAGTTGCGCGTAAATAC 444 0 

WSALGEDKKNNFRFHHI STD 
TGGTCTGCCCTTGGCGAAGATAAAAAAAATAATTTTCGTTTTCATCATATTTCCACTGAT 4500 

EVYGDLPH PDEVENSVTLPL 
GAAGTTTACGGCGATTTACCGCATCCTGATGAAGTTGAAAACAGCGTTACGCTGCCGTTA 4560 

FTETTAYAPS SPYSASKAS S 
TTTACTGAAACGACGGCATATGCGCCAAGTAGCCCCTATTCTGCGTCAAAAGCATCCAGC 462 0 

DHLVRAWRRTYGLPTIVTNC 
GATCATTTAGTCCGTGCCTGGCGGCGTACCTATGGTCTACCAACGATCGTTACCAATTGT 4 68 0 

SNNYGPYHFPEKLIPLVILN 
TCTAATAACTATGGCCCTTATCACTTCCCTGAAAAACTGATTCCGTTGGTCATTTTGAAC 474 0 

ALEGKPLPIYGKGDQIRDWL 
GCACTGGAAGGAAAGCCTTTGCCAATTTATGGCAAAGGGGATCAGATTCGCGATTGGCTA 4 80 0 

YVEDHARALHMVVTEGKAGE 
TATGTAGAAGATCATGCTCGCGCGCTTCATATGGTAGTGACTGAAGGCAAGGCAGGGGAG 486 0 

TYNIGGHNEKKNLDVVFTIC 
ACTTATAACATTGGTGGACACAATGAGAAGAAAAATCTCGATGTGGTATTTACCATCTGT 492 0 

DLLDEIVPKATSYREQITYV 
GATCTGCTGGATGAGATTGTACCCAAAGCGACTTCTTATCGTGAACAAATCACTTATGTC 498 0 

ADRPGHDRRYAIDAGKISRE 
GCGGATCGTCCGGGCC ATGATCGTCGTTATGCC ATTGATGC AGGTAAAATTAGCCGCGAA 5040 

LGWKPIiETFESGIRKTVEWY 

TTAGGCTGGAAACCGCTGG AGACCTTTGAAAGCGGTATTCGTAAAACAGTGGAATGGTAC 5100 

LANTQWVNNVKSGAYQSWIE 
CTTGC AAATACTCAATGGGTAAACAATGTTAAAAGTGGGGCGTATCAGAGTTGGATAGAA 516 0 

End of rmlB Start of rmlD 

QNYEGRQ* 

MNIL LFGKTGQV 
CAGAACTATGAAGGACGCCAG TAATGAATATCTTACTTTTTGGTAAGACAGGGCAAGTAG 522 0 
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GWELQRSLAPVGNLIALDVH 
GCTGGGAGTTGCAACGTTCTCTGGCACCGGTAGGGAATCTGATTGCCCTGGATGTCCATT 

SKEFCGDFSN PKGVAETVRK 
CAAAAGAGTTTTGCGGTGATTTTAGTAATCCGAAAGGCGTTGCCGAAACCGTTCGTAAGC 

LRPDVIVNAAAHTAVDKAES 
TTCGTCCCGATGTGATTGTTAACGCAGCAGCCCATACTGCAGTAGATAAAGCAGAGTCTG 

EPELAQL LNATSVEAIAKAA 
AACCAGAACTGGCGCAGTTACTTAACGCCACCAGTGTGGAAGCCATCGCTAAAGCAGCCA 

NETGAWVVHYSTDYVFPGTG 
ACGAAACTGGCGCATGGGTAGTGCATTATTCAACCGATTATGTATTTCCTGGTACCGGCG 

DI PWQETDAT S PLNVYGKTK 
ATATCCCATGGCAGGAAACGGACGCTACGTCGCCGCTGAATGTCTATGGCAAAACCAAAC 

LAGEKALQDNC PKHLI FRTS 
TGGCGGGAGAAAAGGCCCTGCAGGATAACTGCCCTAAACACCTTATCTTCCGCACCAGTT 

WVYAGKGNNFAKTMLRLAKE 
GGGTTTATGCAGGTAAGGGCAATAATTTCGCAAAGACAATGCTTCGTCTGGCGAAAGAGC 

RQTLSVINDQYGAPTGAEL L 
GTCAGACACTTTCAGTCATTAACGATCAGTACGGTGCGCCAACCGGTGCGGAATTACTGG 

ADCTAHAIRVALNKPEVAGL 
CTGACTGTACGGCGCATGCGATCCGTGTGGCGTTAAATAAACCAGAAGTCGCAGGTCTTT 

YHLVAGGTTTWHDYAALVFD 
ACCATCTGGTTGCCGGGGGAACCACAACCTGGCATGACTACGCGGCCTTAGTCTTTGACG 

EARKAGITLALTELNAVPTS 
AGGCGCGCAAAGCAGGGATAACGCTTGCGCTGACTGAGCTTAATGCTGTGCCGACCAGCG 

AYPTPASRPGNSRLNTEKFQ 
CCTACCCGACGCCGGCGAGCAGACCAGGCAATTCGCGTCTCAATACTGAAAAGTTTCAGC 

RNFDL I LPQWELGVKRMLTE 
GTAATTTTGACCTTATTCTGCCTCAATGGGAATTAGGAGTTAAGCGTATGCTGACTGAAA 

End of rmlD 

MFTTTTI * 

TGTTTACGACGAC AACCATC TAATAAATTTAAATGCC C ATC AGGGC ATTTTCTATGAATG 

Start: of xmlA 

MKTRKGI ILAGGSGTRL 
AGAAATGGAAATGAAAACGCGTAAGGGCATTATTTTAGCGGGGGGCTCCGGCACCCGTCT 



YPVTMAVSKQL LPIYDKPMI 
TTATCCGGTGACCATGGCGGTAAGTAAGCAATTGCTACCAATTTATGATAAACCGATGAT 

YYPL STLMLAGIRDIL I 1ST 
TTACTATCCCCTTTCCACGCTTATGCTGGCAGGCATTCGGGATATCCTGATCATCAGTAC 

PQDTPRFQQLLGDGSQWGLN 
GCCACAGGACACGCCGCGTTTTCAACAACTGCTGGGAGACGGCAGCCAGTGGGGGCTGAA 

LQYKVQPSPDGLAQAF I IGE 
TCTTCAATATAAAGTACAGCCAAGCCCGGATGGCTTAGCACAGGCGTTTATTATTGGTGA 

EFIGHDDCALVLGDNI FYGH 
AGAGTTCATTGGTCATGATGATTGTGCATTAGTGCTGGGTGACAATATCTTCTATGGTCA 
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DLPKLMEAAVNKESGATV F A 
TGATTTACCAAAGTTAATGGAAGCTGCCGTTAATAAAGAAAGTGGTGCTACCGTCTTCGC 

YHVNDPERYGVVEFDQKGTA 
TTATCATGTAAACGATCCGGAGCGCTACGGTGTGGTTGAGTTTGACCAAAAGGGCACAGC 

VSLEEKPLQPKSNYAVTGLY 
CGTTAGTCTGGAAGAAAAACCATTACAACCGAAGAGTAATTACGCGGTAACGGGGCTGTA 

FYDNSVVEMAKNLKPSARGE 
TTTTTATGATAATAGCGTGGTGGAGATGGCGAAAAATCTTAAGCCTTCCGCTCGCGGTGA 

LEITDINRIYMEQGRLSVAM 
GTTAGAAATCACGGATATTAACCGTATCTATATGGAGCAGGGAAGATTGTCTGTCGCTAT 

MGRGYAWLDTGTHQSLIEAS 
GATGGGGCGCGGTTATGCCTGGCTGGATACAGGGACGCATCAGAGTTTGATAGAGGCCAG 

NFIATIEERQGLKVSCPEEI 
TAATTTTATTGCAACCATCGAAGAACGCCAGGGGCTAAAAGTGTCCTGCCCGGAAGAGAT 

AFRKNFINAQQVIELAGPLS 
CGCATTTCGTAAAAATTTTATAAATGCACAACAGGTTATAGAACTGGCCGGGCCATTATC 
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End of xmZA Start of rmlC 

KNDYGKYL LKMVKGL * V M I V 
AAAAAATGATTATGGC7VAATATTTGCTG AAG ATGGTGAAAGGTTTA TAAGTG ATG ATTGT 702 0 



IKTAIPDVLILEPKVFGDER 
GATTAAAACAGCAATACCAGATGTCTTGATCTTAGAGCCTAAAGTTTTTGGCGATGAGAG 

GFFFESYNQQTFEELIGRKV 
GGGATTCTTTTTTGAAAGTTATAACCAGCAGACCTTTGAAGAGTTGATTGGACGTAAAGT 

TFVQDNHSKSKKNVLRGLHF 
TACATTTGTTCAAGATAATCATTCAAAATCCAAAAAGAACGTACTCAGAGGGCTACATTT 

QRGENAQGKLVRCAVGEVFD 
TCAGAGAGGAGAAAATGCACAGGGGAAGTTAGTTCGTTGTGCTGTCGGTGAGGTTTTTGA 

VAVDI RKES PTFGQWVGVNL 
TGTTGCGGTCGATATCCGAAAAGAATCGCCTACTTTTGGTCAATGGGTTGGTGTAAATCT 

SAENKRQLWI PEGFAHGFVT 
GTCTGCTGAGAATAAGCGACAGCTTTGGATTCCAGAAGGTTTTGCTCATGGTTTTGTTAC 

LSEYAEFLYKATNYYSPSSE 
TCTTAGTGAGTATGCAGAGTTTCTGTACAAAGCAACTAATTATTACTCACCTTCATCGGA 

GSILWNDEAIGIEWPFSQLP 
AGGTAGCATTCTATGGAATGATGAGGCAATAGGTATTGAATGGCCTTTTTCTCAGCTGCC 

End of xmlC 

ELSAKDAAAPLLDQALLTE* 
TGAGCTTTC AGCAAAAGATGCTGCAGCACCTTTACTGGATCAAGCCTTGTTAACAGAG TA 75 60 

Start of ddhD 

VSHIIKIFPSNIEFSGRE 
AGCATCGTGTCTC ATATTATTAAGATTTTTCCATCAAAT ATTGAATTTTCCGGTAGAGAG 7620 

DESILDAALSAGIHLEHSCK 
GATGAATCAATCCTCGATGCTGCGCTATCGGCTGGTATCCATCTTGAACATAGCTGCAAA 

AGDCGICESDLLAGEVVDSK 
GCGGGTGATTGTGGTATCTGTGAGTCCGATTTGTTGGCGGGAGAAGTTGTTGACTCCAAA 
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GNI FGQGDKILTC CCKPKTA 
GGTAATATTTTTGGACAGGGTGATAAAATACTAACCTGCTGCTGTAAACCTAAAACCGCC 

LELNAHFFPELAGQTKKIVP 
CTTGAGCTAAATGCGCATTTTTTTCCTGAACTAGCTGGACAGACAAAAAAAATTGTCCCA 

CKVNSAVLVSGDVMTLKLRT 
TGCAAGGTAAATAGTGCTGTACTGGTTTCAGGCGATGTTATGACTTTGAAGTTACGCACA 

PPTAKIGFLPGQYINLHYKG 
CCACCAACAGCAAAAATTGGCTTCCTTCCAGGGCAGTATATCAATTTACATTATAAAGGT 

VTRSYSIANSDESNGIELHV 
GTAACTCGCAGTTATTCTATCGCTAATAGTGATGAGTCGAATGGTATTGAGTTGCATGTA 

RNVPNGQMS SLIFGELQENT 
AGGAATGTTCCCAATGGTCAGATGAGTTCGCTCATTTTTGGGGAGTTACAAGAAAATACT 

LMRIEGPCGTF FIRESDRPI 
CTTATGCGCATTGAAGGGCCTTGCGGAACATTTTTTATTCGTGAAAGTGACAGACCTATA 

IFLAGGTGFAPVKSMVEHLI 
ATCTTCCTTGCAGGCGGTACTGGATTCGCTCCAGTTAAATCAATGGTTGAGCATCTCATT 

QGKCRRE IYIYWGMQYSKDF 
CAGGGAAAATGTCGTCGTGAGATCTACATTTACTGGGGAATGCAATATAGTAAAGATTTT 

YSALPQQWSEQHDNVHYIPV 
TACTCTGCATTACCGCAGCAGTGGAGTGAACAGCACGACAACGTTCATTATATCCCTGTT 

VSGDDAEWGGRKGFVH HAVM 
GTTTCTGGTGATGACGCCGAATGGGGGGGAAGAAAGGGATTTGTCCATCATGCCGTGATG 

DDFDSLEFFDIYACGS PVMI 
GATGATTTTGATTCTCTAGAGTTCTTCGATATATATGCATGTGGTTCACCTGTGATGATC 

DASKKDFMMKNLSVEHFYSD 
GATGCCAGTAAAAAGGACTTTATGATGAAAAATCTCTCTGTAGAACATTTCTATTCTGAT 

End of ddhD Start of ddhA 

AFTASNNIEDNL* 

MKAVILAG 
GCATTTACCGCATCTAATAATATTGAGGATAATTTATGAAAGCGGTCATCCTGGCTGGTG 

GLGTRLSEETIVKPKPMVEI 
GACTTGGTACCAGACTAAGTGAAGAAACAATTGTAAAACCAAAACCGATGGTAGAAATTG 

GGKPILWHIMKMYSVHGIKD 
GTGGCAAGCCTATTCTTTGGCACATTATGAAAATGTATTCTGTGCATGGTATCAAGGATT 

FI ICCGYKGYVIKEYFANYF 
TTATTATCTGCTGTGGTTATAAAGGATATGTGATTAAAGAATATTTTGCGAACTACTTCC 

LHMSDVTFHMAENRMEVH HK 
TTCACATGTCAGATGTAACATTCCATATGGCTGAAAACCGTATGGAAGTTCACCATAAAC 

RVEPWNVTLVDTGDS SMTGG 
GTGTTGAACCATGGAATGTCACATTGGTTGATACGGGTGATTCTTCAATGACTGGTGGTC 

RLKRVAEYVKD DEAFL FTYG 
GTCTGAAACGTGTTGCTGAATACGTAAAAGATGACGAGGCTTTCCTGTTTACTTATGGTG 

DGVADLDIKATIDFHKAHGK 
ATGGCGTTGCCGACCTTGATATCAAAGCGACTATCGATTTCCATAAGGCTCACGGTAAGA 
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KATLTATFP PGRFGALDIRA 
AAGCGACTTTAACAGCTACTTTTCCACCAGGACGCTTTGGCGCATTAGATATCCGAGCTG 

GQVRSFQEKPKGDGAMING G 
GTCAGGTCCGGTCATTCCAGGAAAAACCGAAAGGCGATGGGGCAATGATCAATGGTGGTT 

FFVLNPSVIDLIDNDATTWE 
TCTTTGTGTTGAATCCATCGGTTATCGATCTCATCGATAACGATGCAACAACCTGGGAAC 

QEPLMTLAQQGELMAFEHPG 
AAGAGCCATTAATGACATTGGCACAACAGGGGGAGTTAATGGCTTTTGAACACCCAGGTT 

FWQPMDTLRDKVYLEGLWEK 
TCTGGCAGCCGATGGATACCCTACGTGATAAAGTTTACCTCGAAGGGCTGTGGGAAAAAG 

End of ddhA Start of ddhB 

MIDKNFWQG 

GKAPWKTWE* 

GTAAAGCTCCGTGGAAAACCTGGGAGTAACTAGATGATTGATAAAAATTTTTGGCAAGGT 

KRVFVTGHTGFKG SWLSLWL 
AAACGTGTATTCGTTACCGGCCATACTGGCTTTAAAGGAAGCTGGCTTTCGCTATGGCTG 

TEMGAIVKGYALDAPTVPSL 
ACTGAAATGGGTGCAATTGTAAAAGGCTATGCACTTGATGCGCCAACTGTTCCAAGTTTA 

FEIVRLNDLMESH IGDIRDF 
TTTGAGATAGTGCGTCTTAATGATCTTATGGAATCTCATATTGGCGACATTCGTGATTTT 

EKLRNSIAEFKPEIVFHMAA 
GAAAAGCTGCGCAATTCTATTGCAGAATTTAAGCCAGAAATTGTTTTCCATATGGCAGCC 

QPLVRLSYEQPIETYSTNVM 
CAGCCTTTAGTGCGCCTATCTTATGAACAGCCAATCGAAACATACTCAACAAATGTTATG 

GTVHL LETVKQVGNI KAVVN 
GGTACTGTCCATTTGCTTGAAACAGTTAAGCAAGTAGGTAACATAAAGGCAGTCGTAAAT 

ITSDKCYDNREWVWGYRENE 
ATCACCAGTGATAAGTGCTACGACAATCGTGAGTGGGTGTGGGGCTATCGTGAGAACGAA 

PMGGYDPYSNSKGCAELVAS 
CCCATGGGAGGGTACGATCCATACTCTAATAGTAAAGGTTGTGCAGAATTAGTCGCGTCT 

AFRNSFFNPANYEQHGVGLA 
GCATTCCGGAACTCATTCTTCAATCCTGCAAATTATGAGCAACATGGCGTTGGTTTGGCG 

SVRAGNVIG GGDWAKDRL I P 
TCTGTGAGGGCTGGTAATGTCATAGGCGGAGGCGATTGGGCTAAAGACCGTTTAATTCCC 

DILRSFENNQQVI IRNPYSI 
GATATTCTGCGCTCATTTGAAAATAACCAGCAGGTTATTATTCGAAACCCATATTCTATC 

RPWQHVLEPLSGY IVVAQRL 
CGTCCCTGGCAGCATGTACTGGAGCCTCTTTCTGGTTACATTGTGGTGGCGCAACGCTTA 

YTEGAKFSEGWNFGPRDEDA 
TATACAGAAGGTGCTAAGTTTTCTGAAGGATGGAATTTCGGCCCGCGTGATGAAGATGCG 

KTVEF IVDKMVTLWGD DASW 
AAGACGGTCGAATTTATTGTTGACAAGATGGTCACGCTTTGGGGTGATGATGCAAGCTGG 

LLDGENHPHEAHYLKLDCSK 
TTACTGGATGGTGAGAATCATCCTCATGAGGCACATTACCTGAAACTGGATTGCTCTAAA 
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ANMQLGWHPRWGLTETLGRI 
GCAAATATGCAATTAGGATGGCATCCGCGTTGGGGATTGACTGAAACACTTGGTCGCATC 

VKWHKAWIRGEDMLICSKRE 
GTAAAATGGCATAAAGCATGGATTCGCGGCGAAGATATGTTGATTTGTTCAAAGCGTGAA 

End of ddhB 

I SDYMSAT TR * 
ATCAGCGACTATATGTCTGCAACTACTCGT TAAGAAAATAAGTTTAAGGAATCAAAGTAA 



MTANNLREQ I SQLVAQYANE 
TGACAGCAAATAACCTGCGTGAGCAAATCTCTCAGCTTGTCGCTCAGTATGCGAATGAGG 

ALSPKPFVAGTSVVPPSGKV 
CATTGAGCCCGAAACCTTTTGTTGCAGGTACAAGCGTTGTGCCTCCTTCCGGGAAGGTTA 

IGAKELQLMVEASLDGWLT T 
TTGGTGCCAAAGAGTTACAATTGATGGTTGAGGCGTCTCTTGATGGATGGCTAACTACTG 

GRFNDAFEK KLGEFIGVPHV 
GTCGTTTCAATGATGCCTTTGAAAAAAAACTTGGGGAATTTATTGGGGTTCCTCATGTTT 

LTTTSGSSANLLALTALTSP 
TAACGACAACATCTGGCTCTTCGGCAAACTTGCTGGCACTGACTGCGCTGACTTCCCCAA 

KLGERALKPGDEVITVAAGF 
AATTAGGCGAGCGAGCTCTCAAACCTGGTGATGAGGTTATTACTGTCGCTGCTGGCTTCC 

PTTVNPAIQNGLIPVFVDVD 
CGACTACAGTTAACCCGGCGATCCAGAATGGTTTAATACCGGTATTCGTGGATGTTGATA 

IPTYNIDASLIEAAVTEKSK 
TCCCGACATATAATATCGATGCCTCTCTCATTGAAGCTGCAGTTACTGAGAAATCAAAAG 

AIMIAHTLGNAFNLSEVR R I 
CGATAATGATCGCTCATACACTCGGTAATGCATTTAACCTGAGTGAAGTTCGTCGGATTG 

ADKYNLVJLI EDC CDALGTTY 
CCGATAAATATAACTTATGGTTGATTGAAGACTGCTGTGATGCCCTTGGGACGACTTATG 

EGQMVGTFGD IGTVSFYPAH 
AAGGCCAGATGGTAGGTACCTTTGGTGACATCGGAACCGTTAGTTTTTATCCGGCTCACC 

HITMGEGGAVFTKSGELKKI 
ATATCACAATGGGTGAAGGCGGTGCTGTATTCACCAAGTCAGGTGAACTGAAGAAAATTA 

IESFRDWGRDCYCAPGCDNT 
TTGAGTCGTTCCGTGACTGGGGCCGGGATTGTTATTGTGCGCCAGGATGCGATAACACCT 

CGKRFGQQLGSLPQGYDHKY 
GCGGTAAACGTTTTGGTCAGCAATTGGGATCACTTCCTCAAGGCTATGATCACAAATATA 

TYSHLGYKLKITDMQA ACGL 
CTTATTCCCACCTCGGATATAATCTCAAAATCACGGACATGCAGGCAGCATGTGGTCTGG 

AQLERVE EFVEQRKANFSYL 
CTCAGTTGGAGCGCGTAGAAGAGTTTGTAGAGCAGCGTAAAGCTAACTTTTCCTATCTGA 

KQGLQSCTEFLELPEATEKS 
AACAGGGCTTGCAATCTTGCACTGAATTCCTCGAATTACCAGAAGCAACAGAGAAATCAG 

DPSWFGFPITLKETSGVNRV 
ATCCATCCTGGTTTGGCTTCCCTATCACCCTGAAAGAAACTAGCGGTGTTAACCGTGTCG 
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ELVKFLDEAK IGTRL LFAGN 
■AACTGGTGAAATTCCTTGATGAAGCAAAAATCGGTACACGTTTACTGTTTGCTGGAAATC 



SGFIGKHLLEALKKSGISVV 
CCGGCTTTATTGGTAAGCATTTACTCGAAGCGCTAAAAAAATCGGGGATTTCAGTTGTCG 



ANVIKPLKL LDLAIKYRADI 
CAAATGTTATAAAACCATTAAAGCTTCTTGATTTGGCAATAAAATATCGGGCGGATATCT 



YIITKRHFDEIGHYYANMHD 
ATATAATTACTAAAAGACACTTTGATGAAATTGGGCATTATTATGCTAATATGCATGACA 

ISFVNMRLEHVYGPGDGENK 
TTTCATTTGTAAACATGCGATTAGAGCATGTATATGGGCCTGGGGATGGTGAAAATAAAT 



LENRKEVPSY TEYQVGTGAG 
TAGAAAATAGAAAAGAAGTACCTTCATATACTGAGTATCAAGTTGGAACTGGTGCTGGGG 



11580 



LIRQPYFANVKYRVVGELTN 

TG ATTCGCC AACCGT ATTTTGCT AATGTG AAATATCGTGTAGTGGGTGAGTTG AC AAATA 11640 

TDRIMNQTFWIGIYPGLT TE 
CCGACCGTATAATGAATCAAACGTTCTGGATTGGTATTTATCCAGGCTTGACTAC AGAGC 11700 

End of ddhC 

HLDYVVSKFE EFFGLNF * 

ATTTAGATTATGTAGTTAGCAAGTTTGAAGAGTTCTTTGGTTTGAATTTC TAATTCAATT 117 60 

Start of aJbe 

MTFLKEYVIVSGA 

TATTCTATCTGGTGATTGCG ATGACCTTTTTGAAAGAATATGTAATTGTC AGTGGGGCTT 1182 0 



11880 



AITRDVIKNNSNALANVRWC 
CAATCACTCGAGATGTAATAAAAAATAATAGTAATGCATTAGCTAATGTTAGATGGTGCA 11940 

SWDNIELLVEELSIDSALIG 
GTTGGGATAATATCGAATTATTAGTCGAGGAGTTATCAATTGATTCTGCATTAATTGGTA 12 000 

I IHLATEYGHKTS SLINIE D 
TCATTCATTTGGCAACAGAATATGGGCATAAAAC ATCATCTCTC ATAAATATTGAAGATG 12060 



12120 



FLNTDSFFAKKDFNYQHMRP 
TTTTAAATAC AGATAGTTTTTTTGCC AAGAAAG ATTTTAATTATC AAC ATATGCGGCCTT 12180 



12240 
12300 



FIPYI IDCLNKKQSCVKCTT 
TTATTCCATAC ATTATCGACTGCTTAAATAAAAAAC AGAGTTGCGTGAAATGTACAACAG 123 60 

GEQIRDF IFVDDVVNAYLTI 
GCGAACAGATAAGAGACTTTATTTTTGTAGATGATGTGGTAAATGCTTATTTAACTATAT 12 420 



12480 



VSLKDFLVYLQNTMMPGSSS 
TAAGTTTGAAAGATTTTCTGGTTTATTTGCAAAATACTATGATGCCAGGTTCATCGAGTA 12 540 

IFEFGAIEQRDNEIMFSVAN 
TATTTGAATTTGGTGCGATAGAGCAAAGAGATAATGAAATAATGTTCTCTGTAGCAAATA 12600 

NKNLKAMGWK PNFDYK KGIE 
ATAAAAATTTAAAAGCAATGGGCTGGAAACC AAATTTCGATTATAAAAAAGGAATTGAAG 12660 

End of abe 

E L L K R L * 

AACT AC TG AAACGGTT A TGAGATTTTCATGATCTTTTAATAAATAAATCGTTAACAAATT 12720 

Start of wzx 

V K V Q L L 

AGTCGCGTTATGTTGTAAAAACTAAGTCGTTTAATTGCATAGTGAAAGTTCAATTGTTAA 12780 
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K"IPS-HLIVAG-S SWLSKI I I A 
AAATTCCGAGTCATTTAATTGTTGCAGGTTCATCATGGTTATCCAAAATAATAATTGCCG 

GVQLASI SYLISMLGE EKYA 
GGGTGCAGTTAGCAAGTATTTCATATCTTATTTCTATGCTAGGTGAAGAGAAATATGCAA 

T FSL LTGL LVVJC SAVDFG I G ~ 
TCTTTAGTTTGTTAACTGGTTTATTAGTATGGTGTAGCGCTGTTGATTTTGGCATAGGTA 

TGLQNYI S ECRAKNKSYDAY 
CAGGACTGCAAAATTATATATCAGAATGCAGAGCCAAAAACAAAAGTTATGATGCATATA 

IKSALHLSFIAI IFFIALFY 
TTAAATCAGCATTACATCTAAGCTTTATAGCTATTATTTTTTTTATTGCTTTATTTTATA 

TFSGVISAKYLS SFHEVLQD 
TTTTTTCTGGGGTAATTTCCGCTAAATATCTTTCTTCTTTTCATGAGGTATTACAGGACA 

KTRMLFFTSCLVFSSIGIGA 
AAACCAGAATGCTCTTTTTTACCTCATGTCTGGTTTTCAGTTCTATTGGAATCGGAGCTA 

IAYKILFAELVGWKANL LNA 
TTGCTTATAAAATACTTTTTGCCGAATTGGTCGGGTGGAAAGCTAATCTATTAAACGCAT 

LSYMIGMLGLLYIYYRGISV 
TATCTTATATGATAGGTATGCTCGGCTTGCTATATATATACTATAGGGGGATCTCAGTTG 

DIKLSLIVLYLPVGMISLCY 
ACATAAAATTATCACTAATAGTCCTGTATCTTCCAGTGGGTATGATTTCATTGTGCTATA 

IVYRYIKLYHVKTTKSHYIA 
TTGTATATAGATACATAAAGCTTTATCATGTTAAAACAACAAAATCTCATTATATAGCAA 

TLRRSSGFFLFTLLSIVVLQ 
TTTTACGTAGATCTTCAGGGTTTTTTCTTTTTACTTTATTATCGATAGTGGTGCTTCAAA 

TDYMVISQRLTPADIVQYTV 
CAGATTATATGGTCATTTCTCAAAGGCTAACTCCTGCTGATATTGTTCAATATACAGTAA 

TMKIFGLVFFIYTAILQALW 
CGATGAAAATTTTTGGTTTAGTCTTTTTTATTTATACTGCTATTTTGCAAGCATTATGGC 

PICAELRVKQQWKKLNKMIG 
CTATATGTGCTGAATTGAGAGTCAAACAGCAATGGAAAAAACTTAACAAAATGATAGGTG 

VNILLGSLYVVGCTIFIYLF 
TCAATATTTTGCTTGGCTCACTATATGTTGTTGGATGTACAATATTTATTTATTTATTTA 

KEOIFSVIAKDINYQVSILS 
AAGAACAGATATTTTCAGTAATAGCCAAAGATATTAATTATCAAGTTTCTATTTTATCTT 

FMLIGIYFCIRVWCDTYAML 
TTATGTTAATTGGCATATATTTCTGTATTCGCGTTTGGTGTGACACTTATGCAATGTTAT 
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LOSMNYLKILWILVPLQAI I 
TGCAAAGTATGAATTATTTAAAAATACTTTGGATATTAGTACCACTACAAGCAATAATTG 

GGIAQWYFS STLGISGVLLG 
GTGGAATAGCACAATGGTATTTTTCTAGTACGCTTGGAATCAGTGGAGTGCTGCTTGGCT 



13980 



LIISFALTVFW G L P_L^T Y L^I^K 

TGATTA' ~™i»»mm»»i 



.TATCTTTTGCTTTAACTGTTTTTTGGGGGCTTCC ACTAACTTACTTAATTAAGG 14040 
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End of wzx Start of wbaV 

A N K -G * MLI-SFCIPTYNRKQ 
CAAATAAGGGATAATCATATGCTTATATCATTTTGTATTCCAACTTATAATAGAAAACAA 14100 



YLEELLNSINNQEKFNLDI E 
TATCTTGAAGAGTTGTTGAATAGTATAAATAATCAGGAAAAATTTAATTTAGATATTGAG 

ICISDNASTDGTEEMIDVW-R 
ATATGTATATCAGATAATGCCTCTACTGATGGTACAGAGGAAATGATTGATGTTTGGAGG 

NNYNFPIIYRRNSVNLGPDR 
AACAATTATAATTTCCCAATAATATATCGGCGTAATAGCGTTAACCTTGGGCCAGATAGG 

NFLASVSLANGDYCWIFGSD 
AATTTTCTTGCTTCAGTATCCCTTGCGAATGGGGATTATTGTTGGATATTTGGCAGTGAT 

DALAKDSLAILQTYLDSQAD 
GATGCTCTTGCGAAAGACTCGTTAGCGATATTACAAACTTATCTCGATTCTCAAGCAGAT 

TYLCDRKETGCDLVEIRNPH 
ATATATTTATGTGACAGAAAAGAGACCGGGTGTGATTTAGTTGAGATTAGAAACCCTCAT 

RSWLRTDDELYVFNNNLDRE 
CGTTCTTGGCTCAGAACAGATGATGAACTTTATGTGTTTAATAATAATTTAGATAGGGAA 

TYLSRCLSIGGVFSYLSSLI 
ATCTATCTCAGTAGATGCTTATCTATTGGTGGTGTATTTAGCTATCTAAGTTCTTTAATA 

VKKERWDAIDFDASYIGTSY 
GTAAAAAAAGAACGATGGGATGCCATTGATTTTGATGCGTCCTATATTGGCACTTCCTAT 

PHVFIMMSVFNTPGCLLHYI 
CCTCATGTATTTATCATGATGAGCGTATTTAATACGCCAGGGTGCCTTTTGCATTATATA 

SKPLVICRGDNDSFEK KGKA 
TCAAAACCACTCGTAATATGCCGAGGAGATAATGATAGTTTCGAGAAGAAAGGAAAGGCC 

RRILIDFIAYLKLANDFYSK 
AGACGAATTTTAATTGATTTTATTGCATATTTAAAATTAGCTAATGATTTTTACAGTAAA 

NISLKRAFENVL LKERPWLY 
AATATATCTTTAAAACGAGCATTTGAAAATGTTTTGCTAAAAGAGAGACCATGGTTATAT 

TTLAMACYGNSDEKRDLSEF 
ACAACTTTGGCTATGGCATGTTATGGCAATAGTGATGAAAAAAGAGATTTATCTGAATTT 

YAKLGCNKNMINTVLRFGKL 
TATGCAAAGCTAGGTTGTAATAAAAATATGATCAACACTGTACTTCGATTTGGGAAACTA 
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End of wbaV 

AYAVKNITVLKNFTKRI IK* 
GCATATGCAGTG AAAAATATTACCGTGCTTAAGAATTTTACTAAACGGATAATTAAG TAG 15060 

TAGTAAGTTATTATATTGAGATTAAATGTAGATTTAACCTTTCTGGATTCAGCTAGATTT 15120 

AC GTT ACTGAC TTTTCTTTTTAATGAAAATC ATATTTGATATATATAAATAAATTTGGAT 15180 

AGCTTAACTACTTAGATGTTTTTTTCTGGGAATGTTAGTATAATAATATATTTCTTTATG 15240 

ATTGTTTTTGTAGTGTTTTACTGCCGGTATTACATTAACTCTATTATTAAGAATTACACC 15300 

TAGTGTAAGCTTCGTAATATTATTTATCCTTATGATTATTGCTTT AAAGATGCGTATGG A 15360 

Start of wbaU m „ 

MIVNLSRLGKSGTG 

AAAACGGAGAGCTATTCAATGATCGTAAACCTATCACGTTTAGGTAAAAGTGGTACGGGA 15420 
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-MWQ-Y S I K F L - T A L R E I A D V D A 
ATGTGGCAATACTCGATTAAATTTTTAACGGCACTGCGAGAAATAGCTGATGTTGACGCA 

IICSKVHADYFEKLGYAVVT 
ATAATCTGTAGCAAGGTACACGCTGATTATTTTGAAAAGCTCGGTTATGCAGTAGTTACT 

VPNIVSNTSKTSRLRPLVW-Y 
GTTCCGAATATTGTTAGCAACACATCAAAAACATCGCGACTTAGACCATTAGTATGGTAT 

VYSYWLALRVLIKFGNKKLV 
GTATATAGTTACTGGCTTGCGCTGAGGGTTTTAATTAAGTTTGGTAATAAAAAATTGGTG 

CTTHHTI PLLRNQTITVHDI 
TGTACTACACATCACACTATCCCCTTACTGAGAAACCAAACGATAACCGTACATGATATA 

RPFYYPDSFIQKVYFRFLLK 
AGACCTTTTTATTATCCAGATAGTTTTATTCAGAAAGTGTATTTTCGCTTTTTATTAAAA 

MSVKRCKHVLTVSYTVKDS I 
ATGTCCGTTAAGCGATGTAAGCATGTTTTAACGGTATCTTATACCGTTAAAGATAGCATT 

AKTYNVDSEKISVIYNSVNK 
GCTAAAACTTATAATGTAGATAGTGAGAAAATATCAGTAATTTATAATAGTGTTAATAAA 

SDFIQKKEKENYFLAVGASW 
TCTGATTTTATACAAAAAAAAGAAAAAGAGAATTACTTTTTAGCTGTTGGTGCAAGTTGG 

PHKNIHSFIKNK'KVWSDSYN 
CCACATAAAAATATTCATTCATTCATAAAAAATAAAAAAGTTTGGTCTGACTCTTATAAT 

LI IVCGRTDYAMSLQ QMVVD 
TTAATTATTGTATGTGGTCGTACTGACTATGCAATGTCTCTCCAACAAATGGTCGTTGAT 

LELKDKVTFLHEVSFNELKI 
CTGGAACTAAAAGATAAAGTGACTTTTTTACATGAAGTCTCATTTAATGAATTAAAGATT 

LYSKAYALVYPS IDEGFGI P 
TTATATTCTAAAGCCTACGCGCTTGTTTATCCATCTATTGATGAGGGTTTTGGTATACCT 

PIEAMASNTPVIVSDIPV F H 
CCTATTGAAGCGATGGCATCAAATACTCCAGTTATAGTGTCCGATATACCAGTATTTCAT 

EVLTNGALYVNPDDEKSWQS 
GAAGTGTTAACCAATGGTGCATTATATGTGAATCCGGATGATGAAAAAAGCTGGCAGAGT 
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16140 

16200 

16260 

16320 



AIKNI EQLPDAI SRFNNYVA „ „ „ 
GCAATTAAAAATATAGAGCAGTTGCCTGATGCAATTTCCCGATTTAACAACTATGTCGCA 163 80 

End of whstU 

RYDFDNMKQMVGNWLAESK * 
CGGTATGACTTTGATAATATGAAGCAGATGGTTGGCAATTGGTTGGCGGAATCAAAA TAA 16440 



Start of wb&N 

MKITLI IPTYNAGS.LWPNVL 
ATGAAAATAACATTAATTATTCCCACATATAATGCAGGGTCGCTTTGGCCTAATGTTCTG 

DAIKQQTIYPDKL IVIDSGS 
GATGCGATTAAGCAGCAAACTATATATCCGGATAAATTGATTGTTATAGACTCAGGTTCT 

KDETVPLASDLKNISIFNID 
AAAGATGAAACGGTTCCGTTAGCCTCAGACCTGAAAAATATATCAATATTTAATATTGAC 



16500 
16560 
16620 



SKDFNHG GTRNLAVAKTLDA 
TCTAAAGATTTTAATCATGGAGGAACCAGAAATTTAGC AGTTGCAAAAACTCTGGACGCT 16680 
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DVI IFLTQDAILADSDAIKN 
GATGTTATAATTTTTCTAACGCAAGATGCAATTCTCGCGGATTCGGATGCAATTAAAAAT 

LVYYFSDPLIAAVCGRQLPH 
TTGGTTTATTATTTTTCAGATCCATTGATAGCAGCGGTTTGTGGTAGACAACTTCCTCAT 

KDANPLAVHARNFNYS SKSI 
AAAGATGCTAATCCCCTTGCAGTGCATGCCAGAAATTTTAATTATAGTTCAAAATCTATT 

VKSKADIEKLGIKTVFMSNS 
GTTAAAAGTAAGGCAGATATAGAAAAATTGGGTATTAAAACTGTATTTATGTCCAATTCT 



LAEDMFMAAKMIQAGYKVAY 
CTTGCCGAGGATATGTTTATGGCGGCTAAGATGATTCAGGCGGGTTATAAGGTCGCCTAC 

CAEAVVRHSHNYTPREEFQR 
TGCGCTGAAGCGGTGGTAAGACACTCCCATAATTATACCCCGCGAGAAGAGTTTCAACGA 



AGGEGFRFVKSEIQFL LKNA 
GCCGGTGGTGAGGGTTTCCGCTTCGTAAAATCAGAGATTCAATTCCTGCTTAAAAATGCA 



GKHWQSLPLSTCRYFSMYKS 
GGCAAGCATTGGCAATCTTTACCGTTGTCTACATGTCGCTATTTTAGCATGTACAAGAGT 



KQFLSVEGKLSMLQNTIKRL 
AAGCAGTTTCTAAGCGTTGAAGGTAAACTATCAATGCTGCAAAATACTATAAAGCGATTA 



AEQLREIDKLANNI ILEPVG 
GCTGAACAACTCCGTGAAATTGACAAGTTAGCAAATAATATTATTCTCGAACCGGTAGGC 

RNTAPAIALAAFCALQNADN 
CGTAATACTGCACCAGCGATCGCTCTTGCCGCGTTTTGTGCGCTCCAGAATGCTGATAAT 



TKAVRHAE EYAANGKLVTFG 
ACGAAAGCTGTCAGACATGCTGAAGAATACGCTGCAAATGGTAAGCTTGTAACTTTTGGT 

IVPTHAETGYGYIR RGELIG 
ATTGTTCCAACGCATGCTGAAACGGGTTATGGATATATTCGTCGTGGTGAGTTGATAGGA 



16740 
16800 
16860 
16920 



FAAYRRSVFEELSGFPEHTI 
TTTGCTGCCTATCGCCGTTCCGTTTTTGAAGAGTTAAGTGGGTTTCCTGAACATACAATT 16980 



17040 
17100 



YFDTGVFHACSPWIQRDFG G 
TATTTTGATACTGGTGTATTTCATGCTTGTTCTCCGTGGATTCAGCGTGACTTTGGCGGA 17160 



17220 



PFWIPRAL LTTFAKFLGYKL 
CCGTTCTGGATTCCAAGAGCTTTATTAAC AACCTTTGCTAAATTCTTGGGTTACAAATTA 17280 



17340 



End of wbaN Start of manC 

YWNNIQYS SSKEIK*MSFLP 
TATTGGAATAATATCC AATATTCTTCGTC AAAAGAGATAAAA TAAATGTCTTTTCTTCCC 17400 

VIMAGGTGSRLWPLSREYHP 
GTAATTATGGCTGGCGGC ACAGGTAGCCGTTTATGGCCGCTTTCACGCGAATATC ATCCG 17 460 



17520 



ASLSTEEPVVICNDRHRFLV 
GCTTC ACTTTCTAC AGAAGAACCCGTTGTCATTTGCAATGAC AGAC ACCGTTTCTTAGTC 17 580 



17640 
17700 



ADPLLLVLAADHVIQDEIAF 
GCTGATCCTCTTTTGTTGGTTCTTGCTGCAGATC ATGTGATTCAGGATGAAATAGCTTTT 17760 



17820 
17880 



NDAYAVAEFVEKPDIDTAGD 
AATGACGCTTATGCAGTGGCTGAATTTGTGGAGAAACCGGATATCG ATACCGCCGGTGAC 17940 

YFKSGKYYWNSGMFLFRAS S 
TATTTC AAATCAGGGAAATATTACTGGAATAGCGGTATGTTTTTATTTCGTGCAAGCTCT 18 000 
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YLN-ELKYL S-PEIYKACEKAV 
TATTTAAACGAATTAAAGTATTTATCACCTGAAATTTATAAAGCTTGTGAAAAGGCGGTA 18060 



GHINPDLDFIRIDKEEFMSC 
GGACATATAAATCCCGATCTTGATTTTATTCGTATTGATAAAGAAGAGTTTATGTCATGC 

PSDSIDYAVMEHTQHAVVIP 
CCGAGTGATTCTATCGATTATGCAGTTATGGAGCACACACAGCATGCGGTGGTGATACCA 



18120 
18180 



MSAGWSDVGSWS SLWDISNK 
ATGAGCGCTGGCTGGTCGGATGTGGGTTCCTGGTCCTCACTTTGGGATATATCGAATAAA 18240 

DHQRNVLKGDIFAHACNDNY 
GATCATCAGAGAAATGTTTTAAAAGGAGATATTTTCGCACATGCTTGTAATGATAATTAC 183 00 

IYSEDMFISAIGVSNLVIVQ 
ATTTATTCCGAAGATATGTTTATAAGTGCGATTGGTGTAAGCAATCTTGTC ATTGTTCAA 18360 

TTDALLVANKDTVQDVKKIV 
ACAACAGACGCTTTACTGGTGGCTAATAAAGATACAGTACAAGATGTTAAAAAAATTGTC 1842 0 

DYLKRNDRNEYKQHQEVFRP 
GATTATTTAAAACGGAATGATAGG AACGAATATAAACAAC ATC AAGAAGTTTTCCGCCCC 18480 

WGKYNVIDSGKNYLVRCITV 
TGGGGAAAATATAATGTGATTGATAGCGGCAAAAATTACCTCGTTCGATGTATCACTGTT 18 540 

KPGEKFVAQMH HHRAEHWIV 
AAGCCGGGTGAGAAATTTGTGGCGC AGATGCATCACCACCGGGCTGAGCATTGGATAGTA 18600 

LSGTARVTKGEQTYMVSENE 
TTATCCGGGACTGCTCGTGTTACAAAGGGAGAGCAGACTTATATGGTTTCTGAAAATGAA 18660 

STFIPPNTIHALENPGMTPL 
TCAACATTTATTCCTCCGAATACTATTCACGCGCTGGAAAATCCTGGAATGACCCCCCTG 1872 0 

KLIEIQSGTYLGEDDI IRLE 
AAGTTAATTGAGATTC AATC AGGTACCTATCTTGGTGAGGATGATATTATTCGTTTAGAA 1878 0 

Start: of rnnnB End of manC 

MNVVNNSRDV 

QRSGFSKEWTNERS * 
CAACGTTCTGGATTTTCGAAGGAGTGGACTA ATGAACGTAGT TAATAATAGCCGTGATGT 18 840 

IYSSGIVFGTSGARGLVKDF 
TATTTATTCATCAGGTATTGTGTTTGGAACGAGTGGGGCTCGCGGTCTTGTAAAAGATTT 18900 

TPQVCAAFTVSFVAVMQEHF 
TACACCTCAGGTATGTGCTGCTTTTACGGTTTCATTTGTTGCCGTTATGCAGGAACATTT 18960 

SFDTVALAI DNRPS SYGMAQ 
TTCCTTTGATACCGTAGC ATTGGCAATAGATAATCGTCCAAGTAGTTATGGGATGGCTCA 1902 0 

ACAAALADKGVNC I FYGVVP 
GGCGTGTGCTGCTGC ATTGGCGGATAAAGGCGTTAACTGTATTTTTTATGGAGTGGTACC 19080 

TPALAFQSMSDNMPAIMVTG 
AACCCCAGCTTTGGCCTTTCAGTCTATGTCTGACAATATGCCTGCGATAATGGTTACGGG 19140 

SHIPFERNGLKFYRPDGEIT 
AAGTC ATATTCCATTCGAGCGGAACGGCCTCAAGTTTTATCGTCCTGATGGTGAAATCAC 192 00 

KHDEAAI LSVEDTCSHLELK 
GAAACATGATGAGGCTGCGATCCTTAGTGTTGAAGATACGTGCAGCCATTTAGAGCTTAA 19260 
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EL I V S E M A AVNY I S RY' T S L F 
AGAACTC ATAGTTTCAGAAATGGCTGC-TGTTAATTATATATCTCGTTATACATCTTTATT 19320 

STPFLKNKRIGIYEHS SAGR 
TTCTACTCCATTCCTGAAAAATAAGCGTATTGGTATTTACGAACATTCAAGCGCTGGGCG 193 80 

DLYKPLFIALGAEVVSLGRS 
TGATCTTTATAAGCCTTTATTTATTGCATTGGGGGCTGAAGTCGTTAGCTTGGGTAGAAG 19440 

DNFVPIDTEAVSKEDREKAR 
CGATAATTTTGTACCTATAGATACAGAGGCTGTAAGCAAAGAGGATCGGGAAAAAGCTCG 19500 

SWAKEFDLDAIFSTDGDGDR 
CTCATGGGCTAAAGAGTTCGATTTAGATGCCATATTCTCGACAGATGGGGATGGTGATCG 19 56 0 

PLIADEAGEWLRGDILGL LC 
CCCTCTTATTGCTGATGAGGCCGGTGAGTGGCTAAGAGGCGATATACTAGGTCTATTATG 1962 0 

SLALDAEAVAIPVSCNSI IS 
TTCACTTGCATTGGATGCAGAAGCCGTCGCTATTCCTGTTAGTTGTAAC AGCATAATTTC 19680 

SGRF FKHVKLTKIGS PYVIE 
TTCTGGCCGCTTTTTTAAACATGTTAAGCTTAC AAAAATTGGCTCGCCTTATGTTATCGA 19740 

AFNELSRSYSRIVGFEANG G 
AGCTTTTAATGAATTATCGCGGAGTTATAGTCGTATTGTCGGTTTTGAAGCCAATGGCGG 19 800 

FLLGSDICINEQNLHALPTR 
TTTTTTATTAGGAAGCGACATCTGTATTAACGAGCAGAATCTTCATGCCTTACCAACTCG 19860 

DAVLPAIML LYKSRNTSISA 
TGATGCTGTATTACCAGC AATAATGCTGCTTTACAAAAGTAGGAATACCAGCATTAGCGC 19920 

LVNELPT.RYTHSDRLQGITT 
TTTAGTCAATGAACTCCCAACTCGTTACACCCATTCTGACAGATTACAGGGGATTACAAC 19980 

DKSQSLISMGRENLSNL LSY 
TGATAAAAGTCAATCCTTAATTAGTATGGGCAGAGAAAATCTGAGCAACCTCTTAAGCTA 2 004 0 

IGLENEGAI STDMTDGMRIT 
TATTGGTTTGGAGAATGAAGGTGCAATTTCTACAGATATGACAGATGGTATGCGAATTAC 2 0100 

LRDGCIVHLRASGNA PELRC 
TTTACGTGATGGATGTATTGTGC ATTTGCGCGCTTCTGGTAATGCACCTGAGTTACGCTG 2 0160 

YAEANLLNRAQDLVNTTLAN 
CTATGCAGAAGCTAATTTATTAAATAGGGCTCAGGATCTTGTAAATACAACGCTTGCTAA 20220 

End of manB 

IKKRCLL* 
TATTAAAAAACGATGCTTGCTG TAAAAAAATTGAATGTTATTTACTTAATATGCCTATTT 2 02 8 0 

Start of wb&P 

MDNIDNKY 
TATTTAC ATTATGC ACGGTC AGAGGGTG AGG ATTAA ATCG ATAATATTG ATAATAAGTAT 20340 

NPQLCKIFLAISDLIF FNLA 
AATCCACAGCTATGTAAAATTTTTTTGGCTATATCGGATTTGATTTTTTTTT^ATTTAGCC 20400 

LWFSLGCVYF IFDQVQRF I P 
TTATGGTTTTCATTAGGATGTGTCTATTTTATTTTTGATC AAGTACAGCGATTTATTCCT 2 0460 

QDQLDTRVITHFILSVVCVG 
CAAGACC AATTAGATACAAGAGTTATTACGCATTTTATTTTGTCAGTAGTATGTGTCGGT 20520 
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WFWIRLRHYTIRKPFWYELK 
TGGTTTTGGATTCGTTTGCGACATTATACTATCCGCAAGCCATTTTGGTATGAGTTAAAA 

EIFRTIVIFAIFDLALIAFT 
GAAATTTTTCGTACGATCGTTATTTTTGCTATATTTGATTTGGCTCTGATAGCGTTTACA 

KWQFSRYVVJVFCWTFALILV 
AAATGGCAGTTTTCACGCTATGTCTGGGTGTTTTGTTGGACTTTTGCCCTAATCCTGGTG 

PFFRALTKHL LNKLGIWKKK 
CCTTTTTTTCGCGCACTTACAAAGCATTTATTGAACAAGCTAGGTATCTGGAAGAAAAAA 

TI ILGSGQNARGAYSALQSE 
ACTATCATCCTGGGGAGCGGACAGAATGCTCGTGGTGCATATTCTGCGCTGCAAAGTGAG 

EMMGFDVIAF FDTDASDAEI 
GAGATGATGGGGTTTGATGTTATCGCTTTTTTTGATACGGATGCGTCAGATGCTGAAATA 

NMLPVIKDTEI IWDLNRTGD 
AATATGTTGCCGGTGATAAAGGATACTGAGATTATTTGGGATTTAAATCGTACAGGTGAT 



ELSKHHCRSVTVVPSFRGLP 
GAACTTTCAAAACATCATTGTCGTTCTGTTACTGTAGTCCCCTCGTTTAGAGGATTGCCA 

LYNTDMSFIF SHEVML LR IQ 
TTATATAATACTGATATGTCTTTTATCTTTAGCCATGAAGTTATGTTATTAAGGATACAA 

NNLAKRSSRFLKRTFDIVCS 
AATAACTTGGCTAAAAGGTCGTCCCGTTTTCTCAAACGGACATTTGATATTGTTTGTTCA 

IMILIIASPLMIYLWYKVTR 
ATAATGATTCTTATAATTGCATCACCACTTATGATTTATCTGTGGTATAAAGTTACTCGA 

DGGPAIYGHQRVGRHGKLFP 
GATGGTGGTCCGGCTATTTATGGTCACCAGCGAGTAGGTCGGCATGGAAAACTTTTTCCA 

CYKFRSMVMN SQEVLKEL LA 
TGCTACAAATTTCGTTCTATGGTTATGAATTCTCAAGAGGTACTAAAAGAACTTTTGGCT 

NDPIARAEWEKDFKLKNDPR 
AACGATCCTATTGCCAGGGCTGAATGGGAGAAAGATTTTAAACTGAAAAATGATCCTCGA 

ITAVGRFIRKTSLDELPQLF 
ATCACAGCTGTAGGTCGATTTATACGTAAAACTAGCCTTGATGAGTTGCCACAACTTTTT 

NVLKGDMSLVGPRPIVSDEL 
AATGTACTAAAAGGTGATATGAGCCTGGTTGGACCACGACCTATCGTTTCGGATGAACTG 

ERYCDDVDYYLMAKPGMTGL 
GAGCGTTATTGTGATGATGTTGATTATTATTTGATGGCAAAGCCGGGCATGACAGGTCTA 

WQVSGRNDVDYDTRVYFDSW 
TGGCAAGTGAGTGGGCGTAATGATGTTGATTATGACACTCGTGTTTATTTTGATTCCTGG 



20580 
20640 
20700 
20760 
20820 
20880 
20940 



VHYILAYEYTELEKTHFWLR 
GTCCATTATATCCTTGCTTATGAATACACCGAGTTGGAGAAAACAC ATTTTTGGCTACGT 21000 



21060 

21120 

21180 

21240 

21300 

21360 

21420 

21480 

21540 

21600 

21660 



YVKNWTLWND IAI LFKTAKV 
TATGTTAAAAACTGGACGCTTTGGAATGATATTGCCATTCTGTTTAAAACAGCGAAAGTT 21720 

End of wb&P 

vlrrdgay* 

GTTTTGCGGCGAGATGGTGCGTAT TAAGCTTACCGAGAAGTACTGAATAATAATTGTATA 217 80 
AATTAGCCTGCGTAAAATCTGAACGCATCAATCGCTACCTTAATATCATACCTTTGAGTT 21840 



Figure 10/16 



WO 98/50531 ^ ^PCT/AU98/00315 

58/58 

AACATACTATTCACCTTTAACCTGCCATGACCGTTTGTGGCAGGGTTTCCACACCTGAC A 21900 
GGAGTATGTAATGTCC AAGCAACAGATCGGCGTCGTCGGTATGGC AGTGATGGGGCGCAA 219 60 
CCTCGCGCTCAACATCGAAAGCCGTGGTTATACCGTCTCCGTTTTCAACCGCTCCCGTGA 22 020 
AAAGACCGAAGAAGTGATTGCCGAGAATCCCGGCAAAAAGCTGGTGCCTTATTACACGGT 22 080 
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